Quick Answer
Last verified:
High confidence

Replicate uses custom pricing as of May 2026 with 3 plans available. Contact Replicate directly for a personalized quote. Plans: Free (free), and Pay-as-you-go (free). Enterprise pricing is available on request. Pricing depends on your chosen tier, contract length, and negotiated discounts.

Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.

  • Free tier: Yes

Replicate offers 3 pricing tiers: Free, Pay-as-you-go, Enterprise. The Pay-as-you-go plan is developers and teams running ai predictions at any scale.

Compared to other ai productivity software, Replicate is positioned at the budget-friendly price point.

  • 4 documented hidden costs beyond list price

How much does Replicate cost?

Replicate uses custom pricing across 3 plans. Contact Replicate directly for a personalized quote. Plans include Free (free), Pay-as-you-go (free), Enterprise (custom pricing).

Replicate Pricing Overview

Replicate uses custom pricing — contact their sales team for a quote. The Free plan is free and is best for trying out ai models and small experiments. The Pay-as-you-go plan is free and is best for developers and teams running ai predictions at any scale. The Enterprise plan requires contacting sales for a custom quote and is designed for organizations with complex requirements or high-volume usage.

Replicate with a None for Pay-as-you-go minimum commitment.

There are at least 4 documented hidden costs beyond Replicate's list price, including implementation, training, and add-on fees.

This pricing was last verified in May 6, 2026 from 2 independent sources.

Replicate is a cloud platform for running AI models via API, with the Pay-as-you-go plan charging per second of compute time and no monthly subscription fee. LLM text models start at $0.03/1M input tokens, while GPU-intensive workloads such as A100-backed inference run approximately $5/hour. Enterprise pricing is custom-quoted for teams with high or predictable compute volumes who need dedicated capacity and custom rate agreements.

How Replicate Pricing Compares

Compare Replicate pricing against top alternatives in AI Productivity.

All Replicate Plans & Pricing

Plan Monthly Annual Best For
Free Free Custom Trying out AI models and small experiments
Pay-as-you-go Free Custom Developers and teams running AI predictions at any scale
Enterprise Contact Sales Contact Sales Organizations with complex requirements or high-volume usage
View all features by plan

Free

  • Free credits to get started
  • Access to thousands of public models
  • Pay-per-use after free credits

Pay-as-you-go

  • No monthly subscription fee
  • Billed per prediction (per token, per image, or per second)
  • Public models: from $0.003/image to $0.25/sec video
  • Private model hardware: $0.09/hr (CPU Small) to $43.92/hr (8x H100 GPU)
  • GPU options: T4 ($0.81/hr), L40S ($3.51/hr), A100 ($5.04/hr), H100 ($5.49/hr)
  • Auto-scaling for private models
  • Deploy custom models via Cog

Enterprise

  • Dedicated account manager
  • Priority support
  • Higher GPU limits
  • Performance SLAs
  • Help with onboarding, custom models, and optimizations
  • Volume discounts for large spend

Usage-Based Rates

Per-unit pricing for Replicate API usage.

Pay-as-you-go

Model Unit Rate
Claude 3.7 Sonnet 1M input tokens $3.00
Claude 3.7 Sonnet 1K output tokens $0.015
DeepSeek R1 1M input tokens $3.75
DeepSeek R1 1K output tokens $0.010
FLUX 1.1 Pro (image) image $0.040
FLUX.1 [schnell] (image) image $0.00300
FLUX.1 [dev] (image) image $0.025
Ideogram v3 Quality (image) image $0.090
Recraft V3 (image) image $0.040
Wan 2.1 (480p video) second $0.090
Wan 2.1 (720p video) second $0.250
  • Public models billed per prediction (token, image, or second)
  • Custom/private models billed per second of hardware time
  • A100 GPU: $0.00140/sec; H100: $0.001525/sec

Compare Replicate vs Alternatives

Before committing to Replicate, compare pricing with these 3 alternatives in the same category.

All Replicate alternatives & migration guides

What Companies Actually Pay for Replicate

Median per-1M-token pricing across 3 models
Input $0.060/1M
Output $0.250/1M
Flagship models in this provider's catalog
Model Input /1M Output /1M Blended /1M
replicate_deepseek-v3-0324 $1.45 $1.45 $1.45
replicate_granite-4-0-h-small $0.060 $0.250 $0.107
replicate_granite-3-3-8b-instruct $0.030 $0.250 $0.085
Review scores
Top pricing complaints
GPU compute pricing significantly higher than raw providers like Runpod for equivalent hardwareCosts scale unexpectedly at high usage volumes with per-second billing
Source: Artificial Analysis — medians aggregated from 3 models in this provider's catalog. Per-1M-token pricing reflects list rates.

Replicate Year 1 Total Cost by Company Size

Real deployment costs including licenses, implementation, training, and admin — not just the sticker price.

Image Generation Finetuning Cost Comparison For a task that requires 1 H100, Replicate charges $1/minute ($60/hr), while 8xH100s on Runpod cost just $2.88/hr - making Replicate 20x more expensive Year 1 total
$60/hr
Total For a task that requires 1 H100, Replicate charges $1/minute ($60/hr), while 8xH100s on Runpod cost just $2.88/hr - making Replicate 20x more expensive

Running image generator finetuning on Replicate's serverless versus alternatives shows the cost difference. Replicate charges $1/minute for workloads that could run on a single H100.

Audio Transcription at Scale (400 hours via Whisper) ~$70 for 400 hours of audio (~$0.0029/run per 1-minute chunk) Year 1 total
~$0.0029/run per 1-minute chunk
Total ~$70 for 400 hours of audio (~$0.0029/run per 1-minute chunk)

Transcribing 400 hours of audio using Whisper Large v2 via Replicate's Pay-as-you-go inference API, processing approximately 1-minute audio chunks as individual runs.

A100 GPU Compute Per Hour ~$5/hr per A100 Year 1 total

Running a single A100 80GB GPU instance on Replicate for model inference or fine-tuning via the Pay-as-you-go plan.

HN discussion on finetuning costs

How Replicate Pricing Compares

Software Starting Price Top Price
Replicate Custom Custom
Clockwise Free $7.75/user/month
Grammarly Business Free $30/user/month
Motion $29/user/month $446/user/month
Notion AI Free $18/user/month
OpenAI Free $200/month

4 Replicate Hidden Costs Beyond the List Price

Beyond the listed price, Replicate has at least 4 documented hidden costs that can significantly increase total cost of ownership.

Watch for 4 hidden costs
  • Serverless Pricing Premium $1/minute
    high 1 source
    Hacker News "the pricing becomes even more astronomical; as you note, $1/minute is unreasonably expensive: that's over 20x the cost of renting 8xH100s on Runpod"
  • GPU Rental Markup 200-300% markup over alternatives
    critical 1 source
    Hacker News "Similar deal with Replicate: an A100 there is over $5/hr, whereas on Runpod it's $1.64/hr"
  • Managed Service Premium Over Raw GPU Compute $3-$4/hr
    high 2 sources
    Hacker News "renting raw compute via Runpod and friends will generally be much cheaper than renting a higher level service that uses that compute e.g. fal.ai or Replicate. For example, an A6000 on fal."
    Hacker News "on Replicate today a one can get an A100 for ~$5/hr which is ... about a month."
  • Unpredictable Cost Growth at Scale 10-30% of license costs
    medium 1 source
    Hacker News "I'm a replicate user. I have experimented with LLAMA2 on the replicate and I have similar experience But you are totally correct about the pricing part it can get expensive I'm running this photo service..."
Tip

Ask your Replicate sales rep about these costs upfront. Getting them in writing before signing can save you from surprise charges later.

Full hidden costs breakdown →

Intelligence sourced from 1 independent sources
Hacker News Tech community
Key claims include inline source attribution. Data verified against multiple independent sources. 10 source citations total.

Replicate Contract Terms

Replicate contracts do not auto-renew. Changes require advance notice. These terms are sourced from verified buyer experiences.

Contract Terms
Auto-Renewal No
Minimum Commitment None for Pay-as-you-go
Mid-Term Downgrade Allowed
Payment Terms Usage-based billing per second of compute time; no monthly subscription fee for standard tiers
Price Escalation No published price escalation schedule; costs track with usage volume
Note

Pay-as-you-go has no minimum commitment; usage stops when you stop running models

How to Negotiate Replicate Pricing

Replicate contracts are negotiable. These 4 tactics are sourced from real buyer experiences and procurement specialists.

Negotiation Playbook 4 tactics
Consider Raw Compute Alternatives high success

Instead of using Replicate's serverless offering, rent raw compute via Runpod or similar providers. An A100 on Runpod is $1.64/hr in Secure Cloud or $0.49/hr in Community Cloud, versus over $5/hr on Replicate.

HN discussion comparing GPU providers
Benchmark Against Raw GPU Providers Before Committing high success

Before scaling production workloads on Replicate, run the same inference workload on Runpod or Lambda Cloud to quantify the managed-service premium. Replicate's A100 pricing (~$5/hr) is approximately 3x Runpod's managed rate ($1.64/hr). Use this delta to build a business case for either negotiating an Enterprise contract or justifying the migration cost of self-hosting.

HN community comparison (2024-10-04)
Use Community Cloud for Lower Risk Workloads medium success

If you're willing to take some risk of boxes disappearing and don't need much security, Runpod's Community Cloud offers significantly cheaper rates than Replicate's managed service.

HN user comparing pricing models
Request Enterprise Pricing for Predictable High-Volume Workloads medium success

If your usage is high and predictable, contact Replicate's Enterprise team for custom pricing. Enterprise contracts on managed inference platforms typically include volume-based rate reductions that can partially close the gap with raw GPU providers while retaining the convenience of managed infrastructure.

Enterprise tier per current tier data

Full negotiation guide →

Replicate Pricing FAQ

01 How does Replicate's pricing compare to alternatives like Runpod?

Replicate is significantly more expensive than raw compute providers. An A100 GPU costs over $5/hr on Replicate versus $1.64/hr on Runpod's Secure Cloud (about 3x more). Replicate's serverless pricing can reach $1/minute, which is over 20x the cost of equivalent compute on Runpod. The premium pays for convenience and managed infrastructure, but costs add up quickly for sustained workloads.

02 Is Replicate's serverless pricing worth the cost?

Replicate's serverless model charges a significant premium over renting GPUs directly. At $1/minute for some workloads, users report this is 'unreasonably expensive' and over 20x the cost of running equivalent compute on platforms like Runpod. The convenience may be worth it for occasional use or prototyping, but actual users confirm 'the pricing part it can get expensive' for regular production workloads.

03 Is Replicate more expensive than other GPU cloud providers?

Yes, Replicate charges a managed-service premium over raw GPU providers. An A100 on Replicate costs over $5/hr, while the same GPU on Runpod's Secure Cloud runs $1.64/hr — roughly a 3x premium. The markup covers Replicate's serverless model deployment, managed infrastructure, and API abstraction, which eliminates container management overhead. If your team can manage GPU infrastructure directly, raw compute providers will be significantly cheaper at scale.

04 How does Replicate's Pay-as-you-go pricing work?

Replicate's Pay-as-you-go plan charges per second of compute time with no monthly subscription fee. The Free plan provides initial credits to get started. Once credits are exhausted, usage is billed against a credit card at per-second rates that vary by model and hardware tier. Enterprise pricing is available for teams with high or predictable compute volumes who want custom rates.

05 What are the cheapest LLM models available on Replicate?

Based on Artificial Analysis data from April 2026, the cheapest LLM on Replicate is priced at $0.03/1M input tokens and $0.25/1M output tokens. The provider median across all tracked models is $0.06 input / $0.25 output per 1M tokens. The most expensive tracked model (DeepSeek V3-0324) runs $1.45/1M for both input and output.

Is this pricing incorrect? — we'll verify and update it.