Quick Answer
Last verified:
High confidence

Google Cloud Speech-to-Text uses custom pricing as of March 2026 with 3 plans available. Contact Google Cloud Speech-to-Text directly for a personalized quote. Plans: Standard (Real-Time) (free), Dynamic Batch (free), and Enterprise (Custom Pricing) (free). Enterprise pricing is available on request. Pricing depends on your chosen tier, contract length, and negotiated discounts.

Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.

  • Free tier: Yes

Google Cloud Speech-to-Text offers 3 pricing tiers: Standard (Real-Time), Dynamic Batch, Enterprise (Custom Pricing). The Dynamic Batch plan is organizations processing large volumes of pre-recorded audio where 24-hour turnaround is acceptable and cost is the primary concern.

Compared to other ai transcription apis software, Google Cloud Speech-to-Text is positioned at the budget-friendly price point.

  • 6 documented hidden costs beyond list price

How much does Google Cloud Speech-to-Text cost?

Google Cloud Speech-to-Text uses custom pricing across 3 plans. Contact Google Cloud Speech-to-Text directly for a personalized quote. Plans include Standard (Real-Time) (free), Dynamic Batch (free), Enterprise (Custom Pricing) (free).

Google Cloud Speech-to-Text Pricing Overview

Google Cloud Speech-to-Text uses custom pricing — contact their sales team for a quote. The Standard (Real-Time) plan is free and is best for teams needing real-time transcription with google's latest chirp model at standard rates with an ongoing free monthly allowance. The Dynamic Batch plan is free and is best for organizations processing large volumes of pre-recorded audio where 24-hour turnaround is acceptable and cost is the primary concern. The Enterprise (Custom Pricing) plan is free and is best for large enterprises on gcp needing custom pricing, data residency, on-device transcription, or integration with google's ai and analytics ecosystem.

There are at least 6 documented hidden costs beyond Google Cloud Speech-to-Text's list price, including implementation, training, and add-on fees.

This pricing was last verified in February 4, 2026 from 2 independent sources.

Google Cloud Speech-to-Text pricing is usage-based, starting free for 60 minutes/month as of March 2026. Standard real-time recognition costs $0.016/15 seconds for the Chirp model. Dynamic Batch processing runs at lower rates with 24-hour turnaround. Enterprise custom pricing is available for high-volume GCP customers. This pricing is verified from 2 independent sources by Costbench, the software pricing database tracking 1,000+ products.

Google Cloud Speech-to-Text is Google's automatic speech recognition (ASR) API powered by the Chirp model family, offering state-of-the-art transcription accuracy across 125+ languages. The V2 API delivers significant improvements over V1 with the Chirp 3 model, data residency options, and both real-time streaming and batch processing. Speech-to-Text integrates deeply with the GCP ecosystem including BigQuery for analytics, Vertex AI for custom model adaptation, and Cloud Storage for audio management.

Standard pricing is $0.016 per minute ($0.96/hour) for real-time transcription, with Dynamic Batch processing available at $0.004/min ($0.24/hour) -- a 75% discount for workloads that can tolerate up to 24-hour turnaround. Google includes the advanced Chirp model at the standard rate with no premium charge. New GCP customers receive $300 in free credits (90-day expiration) plus 60 minutes of free transcription per month on an ongoing basis -- the only major provider with a perpetual monthly free tier.

The primary trade-off with Google Cloud Speech-to-Text is the GCP ecosystem dependency. While the headline rate of $0.016/min is competitive (33% cheaper than AWS Transcribe at $0.024/min), building a production pipeline requires Cloud Storage, Cloud Functions, Pub/Sub, and potentially BigQuery, which can double or triple the effective per-minute cost. Multi-channel audio is billed per channel, so stereo recordings effectively cost $0.032/min. The Dynamic Batch tier at $0.004/min is exceptionally competitive for non-time-sensitive workloads, making Google the cheapest option for overnight batch processing.

In this 2026 pricing guide, we break down Google Cloud Speech-to-Text pricing across standard and batch tiers, calculate real-world costs including GCP ecosystem overhead, expose hidden costs from multi-channel billing, API migration, and egress fees, and compare Google to alternatives like OpenAI Whisper, AWS Transcribe, Deepgram, and AssemblyAI.

How Google Cloud Speech-to-Text Pricing Compares

Google Cloud Speech-to-Text starts at free/minute. Compare: AssemblyAI ($0.15–$0.37/minute), Deepgram ($0.00–$0.02/minute), Rev AI ($0.00–$0.02/minute).

All Google Cloud Speech-to-Text Plans & Pricing

Plan Monthly Annual Best For
Standard (Real-Time) Free tier: 60 min/month (ongoing, no expiration)Free credits: $300 for new customers (90 days) Free Free Teams needing real-time transcription with Google's latest Chirp model at standard rates with an ongoing free monthly allowance
Dynamic Batch Turnaround time: Up to 24 hoursNot suitable for: Real-time or near-real-time applications Free Free Organizations processing large volumes of pre-recorded audio where 24-hour turnaround is acceptable and cost is the primary concern
Enterprise (Custom Pricing) Minimum commitment: Custom (contact Google Cloud sales)Support: Premium or Enhanced support Contact Sales Contact Sales Large enterprises on GCP needing custom pricing, data residency, on-device transcription, or integration with Google's AI and analytics ecosystem
View all features by plan

Standard (Real-Time)

  • Speech-to-text at $0.016/min ($0.96/hour) for standard models
  • Chirp model (V2 API) included at same rate -- no premium charge
  • 125+ languages and variants supported
  • Real-time streaming and batch processing
  • Speaker diarization and automatic punctuation
  • 60 minutes/month free ongoing (not time-limited)
  • $300 free credits for new GCP customers (90-day expiration)

Dynamic Batch

  • Batch transcription at $0.004/min ($0.24/hour) -- 75% cheaper than standard
  • Results delivered within 24 hours
  • Same accuracy as standard tier using Chirp models
  • Ideal for non-time-sensitive workloads
  • All standard features available (diarization, punctuation, etc.)
  • Automatic language detection supported
  • Best cost-per-minute rate among major cloud providers

Enterprise (Custom Pricing)

  • Custom volume-based pricing (as low as $0.004/min at scale)
  • Dedicated account management and technical support
  • Custom SLA with guaranteed uptime
  • Data residency options across GCP regions
  • Integration with Vertex AI and BigQuery for analytics
  • On-device transcription with Speech-on-Device SDK
  • Custom model adaptation for domain-specific vocabulary

Compare Google Cloud Speech-to-Text vs Alternatives

Before committing to Google Cloud Speech-to-Text, compare pricing with these 3 alternatives in the same category.

All Google Cloud Speech-to-Text alternatives & migration guides

Google Cloud Speech-to-Text Year 1 Total Cost by Company Size

Real deployment costs including licenses, implementation, training, and admin — not just the sticker price.

Media Company Archive Processing (500 hours/month, batch) $120 Year 1 total
$1,440/year
Total $120

A media company batch-processing 500 hours (30,000 minutes) of archived audio content monthly using Dynamic Batch at $0.004/min with Chirp model for high accuracy.

Real-Time Captioning Service (200 hours/month) $192 Year 1 total
$2,304/year
Total $192

A live event platform providing real-time captions for 200 hours (12,000 minutes) of streaming content monthly using standard real-time transcription with speaker diarization.

Enterprise Analytics Platform (5,000 hours/month) $4,800 Year 1 total
$57,600/year
Total $4,800

An enterprise processing 5,000 hours (300,000 minutes) of customer calls monthly using standard transcription with speaker diarization, feeding results to BigQuery for analytics.

How Google Cloud Speech-to-Text Pricing Compares

Software Starting Price Top Price
Google Cloud Speech-to-Text Custom Custom
AssemblyAI Free $75/hour
AWS Transcribe Free $6.75/minute
Deepgram Custom Custom
Whisper (OpenAI) $0.003/minute $0.006/minute
Rev AI $0.00167/minute $0.033/minute

6 Google Cloud Speech-to-Text Hidden Costs Beyond the List Price

Beyond the listed price, Google Cloud Speech-to-Text has at least 6 documented hidden costs that can significantly increase total cost of ownership.

Watch for 6 hidden costs
  • GCP ecosystem overhead doubles or triples effective costs: A production transcription pipeline requires Cloud Storage ($0.020/GB/month), Cloud Functions ($0.40/million invocations), Pub/Sub messaging ($0.40/million messages), and egress fees ($0.08-$0.23/GB) -- budget an additional $50-$300/month in supporting GCP service costs beyond the $0.016/min transcription fee
  • Multi-channel audio is billed per channel: A stereo (2-channel) audio file is billed as 2x the audio duration -- a 60-minute stereo recording costs $1.92 (2 x 60 x $0.016) instead of $0.96, effectively doubling the per-minute rate for multi-channel content like phone calls
  • Enhanced model pricing applies per 15-second increment: Audio is billed in 15-second increments rounded up, so a 61-second file costs the same as a 75-second file -- for short audio clips this rounding can increase effective costs by 10-25%
  • Data logging opt-in affects pricing: Google may offer lower pricing if you allow your audio data to be used for model improvement (data logging) -- opting out of data logging to maintain privacy may result in higher per-minute rates or reduced access to volume discounts
  • V1 to V2 API migration costs: The V2 API offers Chirp and newer models not available in V1, but migrating requires code changes, testing, and potentially re-architecting audio pipelines -- budget $2,000-$5,000 in engineering time for V1 to V2 migration
  • Egress and cross-region transfer fees apply to transcription results: Downloading transcription results outside of GCP or across regions incurs network egress charges of $0.08-$0.23/GB -- for large-scale operations generating gigabytes of transcript data monthly, this adds $20-$100/month in hidden network costs
Tip

Ask your Google Cloud Speech-to-Text sales rep about these costs upfront. Getting them in writing before signing can save you from surprise charges later.

Full hidden costs breakdown →

Google Cloud Speech-to-Text Pricing FAQ

01 How much does Google Cloud Speech-to-Text cost?

Google Cloud Speech-to-Text costs $0.016 per minute ($0.96/hour) for standard real-time transcription, including the advanced Chirp model at no extra charge. Dynamic Batch processing is available at $0.004/min ($0.24/hour) with results within 24 hours -- 75% cheaper than standard. New GCP customers receive $300 in free credits (90-day expiration) plus 60 minutes of free transcription per month ongoing. Enterprise volume pricing is available by contacting Google Cloud sales.

02 Is Google Cloud Speech-to-Text free?

Google Cloud Speech-to-Text offers 60 minutes of free transcription per month on an ongoing basis with no expiration -- this is more generous than AWS Transcribe's 12-month limited free tier. New customers also receive $300 in GCP credits usable within 90 days. After exhausting free allowances, standard pricing starts at $0.016/min or $0.004/min for batch processing. The ongoing free tier makes Google a good option for low-volume or experimental transcription workloads.

03 What is Google Cloud Speech-to-Text?

Google Cloud Speech-to-Text is Google's automatic speech recognition (ASR) API that converts audio to text using Google's AI models. The V2 API features the Chirp model family, Google's latest multilingual ASR system with significantly improved accuracy. It supports 125+ languages and variants, real-time streaming and batch processing, speaker diarization, automatic punctuation, and word-level timestamps. Speech-to-Text integrates with the broader GCP ecosystem including BigQuery, Vertex AI, and Cloud Storage.

04 Google Cloud Speech-to-Text vs AWS Transcribe: which is cheaper?

Google Cloud Speech-to-Text is cheaper at standard rates: $0.016/min vs AWS Transcribe's $0.024/min (33% savings). Google's Dynamic Batch mode at $0.004/min is 83% cheaper than AWS's base rate. However, AWS offers aggressive volume discounts dropping to $0.0078/min at 5M+ minutes, while Google's volume pricing requires sales engagement. At very high volumes (5M+ minutes), AWS can become competitive. Choose Google for standard and batch workloads; choose AWS if you process 5M+ minutes monthly or need HIPAA medical transcription.

05 Google Cloud Speech-to-Text vs OpenAI Whisper: which should I use?

OpenAI Whisper at $0.006/min is 62% cheaper than Google's standard rate of $0.016/min for real-time processing. However, Google's Dynamic Batch mode at $0.004/min is 33% cheaper than Whisper for non-time-sensitive workloads. Google offers a more generous ongoing free tier (60 min/month perpetual vs Whisper's one-time $5 credit). Choose Whisper for simple, affordable real-time transcription; choose Google for batch processing at $0.004/min, the ongoing free tier, or deep GCP ecosystem integration.

06 What is the Google Cloud Speech-to-Text Chirp model?

Chirp is Google's latest multilingual ASR model available exclusively through the Speech-to-Text V2 API. Chirp 3 (the current generation) delivers significant improvements in transcription accuracy over previous models and supports 125+ languages. Unlike competitors that charge premium rates for their best models, Google includes Chirp in the standard $0.016/min pricing with no surcharge. Chirp supports both real-time and batch processing, speaker diarization, and automatic punctuation.

07 What is Google Cloud Speech-to-Text Dynamic Batch processing?

Dynamic Batch is a discounted processing tier that delivers transcription results within 24 hours at $0.004/min -- 75% cheaper than the standard $0.016/min real-time rate. It uses the same Chirp models and accuracy as standard processing but with longer turnaround times. Dynamic Batch is ideal for processing large archives of pre-recorded audio, overnight batch jobs, or any workload where real-time results are not required. At $0.004/min, it is the cheapest per-minute transcription rate among major cloud providers.

08 Does Google Cloud Speech-to-Text support real-time streaming?

Yes, Google Cloud Speech-to-Text supports real-time streaming transcription via gRPC with results delivered as audio is processed. Streaming supports interim results (partial transcriptions updated in real-time), speaker diarization, automatic punctuation, and word-level confidence scores. Streaming is billed at the standard $0.016/min rate. The V2 API with Chirp models provides improved streaming accuracy. For applications needing ultra-low latency below 300ms, Deepgram may be a better choice.

Is this pricing incorrect? — we verify and update within 24 hours.