Whisper (OpenAI) Pricing 2026
Complete pricing guide with plans, hidden costs, and negotiation tips
Whisper (OpenAI) pricing varies by team size and features, ranging from $0.003 to $0.006 per minute in 2026. Your actual cost depends on the tier you choose, contract length, and negotiated discounts.
Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.
- Free tier: Yes
- Billing: Monthly and annual (save 15-20%)
- Hidden costs: Add ~35% for implementation, support, and training
Whisper (OpenAI) offers 3 pricing tiers: GPT-4o Mini Transcribe, Whisper / GPT-4o Transcribe, Enterprise (via ChatGPT Enterprise / API). Standard paid plans include GPT-4o Mini Transcribe at $0/minute, Whisper / GPT-4o Transcribe at $0/minute. The Whisper / GPT-4o Transcribe plan is developers needing accurate, affordable transcription with the simplest possible integration and no add-on fees.
Compared to other ai transcription apis software, Whisper (OpenAI) is positioned at the budget-friendly price point.
OpenAI Whisper is a general-purpose speech recognition system available both as an open-source model for self-hosting and as a hosted API. Trained on 680,000 hours of multilingual audio, Whisper supports 99+ languages and delivers high-accuracy transcription at one of the lowest per-minute rates among commercial transcription APIs. OpenAI also offers GPT-4o Transcribe and GPT-4o Mini Transcribe as newer alternatives with improved accuracy and built-in speaker diarization.
Pricing is straightforward at $0.006/min ($0.36/hour) for both the legacy Whisper model and GPT-4o Transcribe, with GPT-4o Mini Transcribe available at $0.003/min ($0.18/hour) for cost-sensitive applications. Unlike competitors that charge separately for features like speaker diarization, OpenAI includes diarization in the GPT-4o Transcribe model at no extra cost. New accounts receive $5 in free credits covering approximately 833 minutes of transcription. Billing is calculated per second with no minimum charge.
The key trade-off with Whisper is simplicity vs features. At $0.006/min, Whisper is 75% cheaper than AWS Transcribe ($0.024/min) and 62% cheaper than Google Cloud Speech-to-Text ($0.016/min) at base rates. However, Whisper lacks built-in audio intelligence features like entity detection, topic classification, and sentiment analysis that competitors like AssemblyAI and Deepgram offer. For teams already in the OpenAI ecosystem using GPT-4 and embeddings, Whisper provides the simplest possible integration with a single API call and flat-rate pricing.
In this 2026 pricing guide, we break down Whisper's per-minute costs across all model variants, calculate real-world costs for common transcription workloads, expose hidden costs around file size limits, rate throttling, and compliance gaps, and compare Whisper to alternatives like Deepgram, AssemblyAI, AWS Transcribe, and Google Cloud Speech-to-Text.
All Whisper (OpenAI) Plans & Pricing
| Plan | Monthly | Annual | Best For |
|---|---|---|---|
| GPT-4o Mini Transcribe Max file size: 25 MB per requestRate limits: Tier-based (50 RPM default) | Free | Free 0 | Cost-sensitive applications needing basic transcription at the lowest per-minute rate in OpenAI's lineup |
| Whisper / GPT-4o Transcribe Max file size: 25 MB per requestRate limits: Tier-based (scales with usage) | Free | Free 0 | Developers needing accurate, affordable transcription with the simplest possible integration and no add-on fees |
| Enterprise (via ChatGPT Enterprise / API) Minimum commitment: Custom (contact sales)Rate limits: Custom (significantly higher) | Contact | Contact | Large organizations processing high volumes needing custom rate limits, enterprise security, and volume discounts |
View all features by plan
GPT-4o Mini Transcribe
- Speech-to-text at $0.003/min ($0.18/hour)
- 99+ language support
- Multiple audio format support (mp3, mp4, wav, webm, etc.)
- Billed per second with no minimum charge
- Near real-time processing (5-10x faster than real-time)
- Also available as token-based pricing at $1.25/1M input tokens
Whisper / GPT-4o Transcribe
- Speech-to-text at $0.006/min ($0.36/hour)
- Whisper (legacy) and GPT-4o Transcribe at same rate
- GPT-4o Transcribe with speaker diarization at $0.006/min
- 99+ language support with improved accuracy
- Billed per second with no minimum charge
- $5 free credits for new accounts (~833 minutes)
- Also available as token-based pricing at $2.50/1M input tokens
Enterprise (via ChatGPT Enterprise / API)
- All Whisper and GPT-4o Transcribe models
- Higher rate limits and concurrency
- Dedicated account management
- Custom usage-based volume discounts
- Admin controls and SSO
- Data processing addendum (DPA) available
- Priority access to new models and features
Get a custom Whisper (OpenAI) quote
Enter your work email and we'll send you a detailed cost breakdown.
Frequently Asked Questions
01 How much does OpenAI Whisper API cost?
OpenAI Whisper API costs $0.006 per minute ($0.36/hour) for both the legacy Whisper model and the newer GPT-4o Transcribe model. GPT-4o Mini Transcribe is available at $0.003/min ($0.18/hour) for cost-sensitive workloads. GPT-4o Transcribe with speaker diarization is also $0.006/min. New accounts receive $5 in free credits covering approximately 833 minutes of Whisper transcription.
02 Is OpenAI Whisper free?
OpenAI Whisper is available as both a free open-source model and a paid API. The open-source model can be self-hosted for free (but requires GPU infrastructure costing $276+/month). The API gives new accounts $5 in free credits covering approximately 833 minutes of transcription. After credits are exhausted, you pay $0.006/min for Whisper or $0.003/min for GPT-4o Mini Transcribe with no free monthly refresh.
03 What is OpenAI Whisper?
OpenAI Whisper is a general-purpose speech recognition model trained on 680,000 hours of multilingual audio data. It is available both as an open-source model (for self-hosting) and as a hosted API through OpenAI's platform. Whisper supports 99+ languages for transcription and translation, handles various audio formats, and processes audio at 5-10x real-time speed. OpenAI has also released GPT-4o Transcribe and GPT-4o Mini Transcribe as successor models offering improved accuracy and features like built-in speaker diarization.
04 OpenAI Whisper vs Deepgram: which is better?
OpenAI Whisper costs $0.006/min vs Deepgram Nova-3 at $0.0043-$0.0077/min depending on plan. Deepgram is cheaper at scale with its Growth plan ($0.0043/min) and offers real-time streaming with sub-300ms latency, while Whisper processes at 5-10x real-time but is not designed for live streaming. Deepgram also provides $200 in free credits vs Whisper's $5. Choose Whisper for simple integration within the OpenAI ecosystem and batch transcription; choose Deepgram for real-time applications, lower per-minute costs at volume, and richer audio intelligence features.
05 OpenAI Whisper vs AWS Transcribe: which is cheaper?
OpenAI Whisper at $0.006/min is 75% cheaper than AWS Transcribe's base rate of $0.024/min. However, AWS Transcribe offers volume-based tiered discounts dropping to $0.0078/min at 5M+ minutes, narrowing the gap for very high-volume users. AWS also includes built-in features like speaker diarization, custom vocabularies, and PII redaction that Whisper lacks natively. Choose Whisper for straightforward, affordable transcription; choose AWS Transcribe if you need deep AWS ecosystem integration, custom vocabularies, or HIPAA-compliant medical transcription.
06 Can I self-host OpenAI Whisper for free?
Yes, Whisper is open-source under the MIT license and can be self-hosted on your own infrastructure at no software cost. However, self-hosting requires GPU infrastructure costing approximately $276/month minimum for a dedicated GPU instance, plus DevOps overhead of $50-$200/month. The break-even point vs the API is roughly 500 hours of transcription per month. Self-hosting makes sense for organizations needing data sovereignty, HIPAA compliance, or processing extremely high volumes where the $0.006/min API rate exceeds fixed infrastructure costs.
07 What audio formats does OpenAI Whisper support?
The Whisper API supports mp3, mp4, mpeg, mpga, m4a, wav, and webm audio formats with a maximum file size of 25 MB per request. Files larger than 25 MB must be split into smaller chunks before upload. The API processes audio at 5-10x real-time speed, meaning a 60-minute file typically completes in 6-12 minutes. Billing is calculated per second of audio processed with no minimum charge per request.
08 Does OpenAI Whisper charge for silence in audio?
Yes, OpenAI Whisper charges for the full duration of submitted audio including silence, music, and non-speech segments. A 60-minute file with 30 minutes of silence costs the same $0.36 as a 60-minute file of continuous speech. To reduce costs, preprocess audio with tools like FFmpeg to strip silence or use voice activity detection (VAD) before sending to the API. This can reduce costs by 20-40% for recordings with significant dead air, such as meeting recordings or surveillance audio.