Quick Answer
Last verified:

Whisper (OpenAI) pricing varies by team size and features, ranging from $0.003 to $0.006 per minute in 2026. Your actual cost depends on the tier you choose, contract length, and negotiated discounts.

Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.

  • Free tier: Yes
  • Billing: Monthly and annual (save 15-20%)
  • Hidden costs: Add ~35% for implementation, support, and training

Whisper (OpenAI) offers 3 pricing tiers: GPT-4o Mini Transcribe, Whisper / GPT-4o Transcribe, Enterprise (via ChatGPT Enterprise / API). Standard paid plans include GPT-4o Mini Transcribe at $0/minute, Whisper / GPT-4o Transcribe at $0/minute. The Whisper / GPT-4o Transcribe plan is developers needing accurate, affordable transcription with the simplest possible integration and no add-on fees.

Compared to other ai transcription apis software, Whisper (OpenAI) is positioned at the budget-friendly price point.

OpenAI Whisper is a general-purpose speech recognition system available both as an open-source model for self-hosting and as a hosted API. Trained on 680,000 hours of multilingual audio, Whisper supports 99+ languages and delivers high-accuracy transcription at one of the lowest per-minute rates among commercial transcription APIs. OpenAI also offers GPT-4o Transcribe and GPT-4o Mini Transcribe as newer alternatives with improved accuracy and built-in speaker diarization.

Pricing is straightforward at $0.006/min ($0.36/hour) for both the legacy Whisper model and GPT-4o Transcribe, with GPT-4o Mini Transcribe available at $0.003/min ($0.18/hour) for cost-sensitive applications. Unlike competitors that charge separately for features like speaker diarization, OpenAI includes diarization in the GPT-4o Transcribe model at no extra cost. New accounts receive $5 in free credits covering approximately 833 minutes of transcription. Billing is calculated per second with no minimum charge.

The key trade-off with Whisper is simplicity vs features. At $0.006/min, Whisper is 75% cheaper than AWS Transcribe ($0.024/min) and 62% cheaper than Google Cloud Speech-to-Text ($0.016/min) at base rates. However, Whisper lacks built-in audio intelligence features like entity detection, topic classification, and sentiment analysis that competitors like AssemblyAI and Deepgram offer. For teams already in the OpenAI ecosystem using GPT-4 and embeddings, Whisper provides the simplest possible integration with a single API call and flat-rate pricing.

In this 2026 pricing guide, we break down Whisper's per-minute costs across all model variants, calculate real-world costs for common transcription workloads, expose hidden costs around file size limits, rate throttling, and compliance gaps, and compare Whisper to alternatives like Deepgram, AssemblyAI, AWS Transcribe, and Google Cloud Speech-to-Text.

All Whisper (OpenAI) Plans & Pricing

Plan Monthly Annual Best For
GPT-4o Mini Transcribe Max file size: 25 MB per requestRate limits: Tier-based (50 RPM default) Free Free 0 Cost-sensitive applications needing basic transcription at the lowest per-minute rate in OpenAI's lineup
Whisper / GPT-4o Transcribe Max file size: 25 MB per requestRate limits: Tier-based (scales with usage) Free Free 0 Developers needing accurate, affordable transcription with the simplest possible integration and no add-on fees
Enterprise (via ChatGPT Enterprise / API) Minimum commitment: Custom (contact sales)Rate limits: Custom (significantly higher) Contact Contact Large organizations processing high volumes needing custom rate limits, enterprise security, and volume discounts
View all features by plan

GPT-4o Mini Transcribe

  • Speech-to-text at $0.003/min ($0.18/hour)
  • 99+ language support
  • Multiple audio format support (mp3, mp4, wav, webm, etc.)
  • Billed per second with no minimum charge
  • Near real-time processing (5-10x faster than real-time)
  • Also available as token-based pricing at $1.25/1M input tokens

Whisper / GPT-4o Transcribe

  • Speech-to-text at $0.006/min ($0.36/hour)
  • Whisper (legacy) and GPT-4o Transcribe at same rate
  • GPT-4o Transcribe with speaker diarization at $0.006/min
  • 99+ language support with improved accuracy
  • Billed per second with no minimum charge
  • $5 free credits for new accounts (~833 minutes)
  • Also available as token-based pricing at $2.50/1M input tokens

Enterprise (via ChatGPT Enterprise / API)

  • All Whisper and GPT-4o Transcribe models
  • Higher rate limits and concurrency
  • Dedicated account management
  • Custom usage-based volume discounts
  • Admin controls and SSO
  • Data processing addendum (DPA) available
  • Priority access to new models and features

Get a custom Whisper (OpenAI) quote

Enter your work email and we'll send you a detailed cost breakdown.

No spam. Unsubscribe anytime.

Hidden Costs to Budget For

Watch for 6 hidden costs
  • Actual costs may exceed the $0.006/min headline rate: Developer reports indicate real-world costs averaging $0.010/min due to billing rounding, retries on failed requests, and processing overhead -- across 648 hours one developer reported spending $397 vs an estimated $233 (70% over budget)
  • No built-in speaker diarization on legacy Whisper model: While GPT-4o Transcribe now includes diarization at $0.006/min, the legacy Whisper model requires a separate post-processing step using GPT-4o or a third-party service, adding $0.002-$0.01/min in additional costs
  • 25 MB file size limit forces chunking overhead: Audio files over 25 MB must be split into smaller segments before upload, requiring engineering effort for chunk management, overlap handling, and transcript reassembly -- budget $500-$1,500 for initial chunking pipeline development
  • No HIPAA BAA available: OpenAI does not offer a Business Associate Agreement, making the Whisper API unusable for Protected Health Information (PHI) -- organizations with healthcare data must self-host Whisper on HIPAA-compliant infrastructure at $1,400+/month
  • Self-hosting break-even at 500+ hours/month: At $0.006/min, 500 hours costs $180/month via API vs ~$276/month for self-hosted GPU infrastructure -- above 500 hours self-hosting becomes cheaper but requires DevOps expertise and GPU management overhead
  • Rate limits throttle high-volume processing: Default tier allows only 50 requests per minute -- processing 10,000+ files requires careful queue management, retry logic, and potentially upgrading to higher API tiers which require spending history with OpenAI
Tip

Ask your Whisper (OpenAI) sales rep about these costs upfront. Getting them in writing before signing can save you from surprise charges later.

Full hidden costs breakdown โ†’

Frequently Asked Questions

01 How much does OpenAI Whisper API cost?

OpenAI Whisper API costs $0.006 per minute ($0.36/hour) for both the legacy Whisper model and the newer GPT-4o Transcribe model. GPT-4o Mini Transcribe is available at $0.003/min ($0.18/hour) for cost-sensitive workloads. GPT-4o Transcribe with speaker diarization is also $0.006/min. New accounts receive $5 in free credits covering approximately 833 minutes of Whisper transcription.

02 Is OpenAI Whisper free?

OpenAI Whisper is available as both a free open-source model and a paid API. The open-source model can be self-hosted for free (but requires GPU infrastructure costing $276+/month). The API gives new accounts $5 in free credits covering approximately 833 minutes of transcription. After credits are exhausted, you pay $0.006/min for Whisper or $0.003/min for GPT-4o Mini Transcribe with no free monthly refresh.

03 What is OpenAI Whisper?

OpenAI Whisper is a general-purpose speech recognition model trained on 680,000 hours of multilingual audio data. It is available both as an open-source model (for self-hosting) and as a hosted API through OpenAI's platform. Whisper supports 99+ languages for transcription and translation, handles various audio formats, and processes audio at 5-10x real-time speed. OpenAI has also released GPT-4o Transcribe and GPT-4o Mini Transcribe as successor models offering improved accuracy and features like built-in speaker diarization.

04 OpenAI Whisper vs Deepgram: which is better?

OpenAI Whisper costs $0.006/min vs Deepgram Nova-3 at $0.0043-$0.0077/min depending on plan. Deepgram is cheaper at scale with its Growth plan ($0.0043/min) and offers real-time streaming with sub-300ms latency, while Whisper processes at 5-10x real-time but is not designed for live streaming. Deepgram also provides $200 in free credits vs Whisper's $5. Choose Whisper for simple integration within the OpenAI ecosystem and batch transcription; choose Deepgram for real-time applications, lower per-minute costs at volume, and richer audio intelligence features.

05 OpenAI Whisper vs AWS Transcribe: which is cheaper?

OpenAI Whisper at $0.006/min is 75% cheaper than AWS Transcribe's base rate of $0.024/min. However, AWS Transcribe offers volume-based tiered discounts dropping to $0.0078/min at 5M+ minutes, narrowing the gap for very high-volume users. AWS also includes built-in features like speaker diarization, custom vocabularies, and PII redaction that Whisper lacks natively. Choose Whisper for straightforward, affordable transcription; choose AWS Transcribe if you need deep AWS ecosystem integration, custom vocabularies, or HIPAA-compliant medical transcription.

06 Can I self-host OpenAI Whisper for free?

Yes, Whisper is open-source under the MIT license and can be self-hosted on your own infrastructure at no software cost. However, self-hosting requires GPU infrastructure costing approximately $276/month minimum for a dedicated GPU instance, plus DevOps overhead of $50-$200/month. The break-even point vs the API is roughly 500 hours of transcription per month. Self-hosting makes sense for organizations needing data sovereignty, HIPAA compliance, or processing extremely high volumes where the $0.006/min API rate exceeds fixed infrastructure costs.

07 What audio formats does OpenAI Whisper support?

The Whisper API supports mp3, mp4, mpeg, mpga, m4a, wav, and webm audio formats with a maximum file size of 25 MB per request. Files larger than 25 MB must be split into smaller chunks before upload. The API processes audio at 5-10x real-time speed, meaning a 60-minute file typically completes in 6-12 minutes. Billing is calculated per second of audio processed with no minimum charge per request.

08 Does OpenAI Whisper charge for silence in audio?

Yes, OpenAI Whisper charges for the full duration of submitted audio including silence, music, and non-speech segments. A 60-minute file with 30 minutes of silence costs the same $0.36 as a 60-minute file of continuous speech. To reduce costs, preprocess audio with tools like FFmpeg to strip silence or use voice activity detection (VAD) before sending to the API. This can reduce costs by 20-40% for recordings with significant dead air, such as meeting recordings or surveillance audio.