DeepInfra Pricing 2026: OSS Model Inference from $0.06/M Tokens

Price checkPer per million tokens

Quick Answer

Last verified: April 15, 2026

High confidence

DeepInfra costs $0.02 to $82.50 per per million tokens as of April 2026. Pricing depends on your chosen tier, contract length, and negotiated discounts.

Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.

Free tier: No free tier available

DeepInfra offers 1 pricing tiers: Pay-as-you-go. The Pay-as-you-go plan is developers needing affordable inference for open-source and commercial models in production.

Compared to other llm api providers software, DeepInfra is positioned at the budget-friendly price point.

4 documented hidden costs beyond list price

How much does DeepInfra cost?

DeepInfra pricing starts at $0.02/per million tokens across 1 plans, with enterprise pricing available on request. Plans include Pay-as-you-go (custom pricing).

DeepInfra Pricing Overview

DeepInfra has 1 pricing plans ranging from $0.02 to $82.50/per million tokens. The Pay-as-you-go plan requires contacting sales for a custom quote and is designed for developers needing affordable inference for open-source and commercial models in production.

DeepInfra with a None minimum commitment, requiring No contract — pay-as-you-go, stop usage anytime notice to cancel.

There are at least 4 documented hidden costs beyond DeepInfra's list price, including implementation, training, and add-on fees.

This pricing was last verified in April 15, 2026 from 2 independent sources.

See DeepInfra Plans

DeepInfra is a serverless AI inference platform specializing in open-source model hosting. It offers OpenAI-compatible APIs for 50+ models including Llama 3.x, Mistral, DeepSeek, and Qwen at highly competitive per-token rates. Pricing starts at $0.06 per million tokens for small models, with no minimum commitments or setup fees. DeepInfra is popular among developers looking for the cheapest way to run large open-source models in production without managing GPU infrastructure.

How DeepInfra Pricing Compares

Compare DeepInfra pricing against top alternatives in LLM API Providers.

Groq $0-$3.0/per million tokens Compare → Together AI $0.03-$9.95/per million tokens / hour Compare → Fireworks AI $0-$9/per million tokens / hour Compare →

All DeepInfra Plans & Pricing

Plan	Monthly	Annual	Best For
Pay-as-you-go concurrentRequests: 200 per account	Contact Sales	Contact Sales	Developers needing affordable inference for open-source and commercial models in production

View all features by plan

Pay-as-you-go

70+ models including open-source and commercial
OpenAI-compatible API
No minimum spend
DeepSeek, Llama 4, Qwen3, Gemini, Claude, Gemma, Mistral support
Image generation (Flux family)
Speech-to-text (Voxtral)
Embedding models
Dedicated GPU deployment option (A100, H100, H200, B200, B300)
Usage-based billing tiers with automatic tier progression

See DeepInfra Plans

Usage-Based Rates

Per-unit pricing for DeepInfra API usage.

Pay-as-you-go

Model	Unit	Rate
DeepSeek-V3.2	1M input tokens	$0.260
DeepSeek-V3.2	1M output tokens	$0.380
DeepSeek-R1-0528	1M input tokens	$0.500
DeepSeek-R1-0528	1M output tokens	$2.15
Llama 4 Maverick 17B-128E	1M input tokens	$0.150
Llama 4 Maverick 17B-128E	1M output tokens	$0.600
Llama 3.3 70B Instruct Turbo	1M input tokens	$0.100
Llama 3.3 70B Instruct Turbo	1M output tokens	$0.320
Meta-Llama-3.1-8B-Instruct	1M input tokens	$0.020
Meta-Llama-3.1-8B-Instruct	1M output tokens	$0.050
Qwen3-235B-A22B-Instruct-2507	1M input tokens	$0.071
Qwen3-235B-A22B-Instruct-2507	1M output tokens	$0.100
Qwen2.5-72B-Instruct	1M input tokens	$0.120
Qwen2.5-72B-Instruct	1M output tokens	$0.390
Gemma 3 27B	1M input tokens	$0.080
Gemma 3 27B	1M output tokens	$0.160
Mistral Small 3.2 24B	1M input tokens	$0.075
Mistral Small 3.2 24B	1M output tokens	$0.200
Gemini 2.5 Flash	1M input tokens	$0.300
Gemini 2.5 Flash	1M output tokens	$2.50
Voxtral Small 24B	minute	$0.00300
FLUX-2-pro	image	$0.015

Cached input tokens available at reduced rates on select models (DeepSeek, Qwen3)
Input and output tokens often same price on small models
Verify current rates at deepinfra.com/pricing — model catalog updated frequently

Compare DeepInfra vs Alternatives

Before committing to DeepInfra, compare pricing with these 3 alternatives in the same category.

VSGroq

Free

Prototyping and evaluation

Full comparison

VSTogether AI

From $0.03/per million tokens / hour

Variable-volume API usage

Full comparison

VSFireworks AI

Free

Variable-volume API usage

Full comparison

All DeepInfra alternatives & migration guides

What Companies Actually Pay for DeepInfra

Median per-1M-token pricing across 93 models

Input $0.200/1M

Output $0.540/1M

Flagship models in this provider's catalog

Model	Input /1M	Output /1M	Blended /1M
deepseek-r1	$0.700	$2.40	$1.13
deepseek-v3-2	$0.210	$0.320	$0.237
deepseek-r1-05-28	$0.500	$2.15	$0.912
kimi-k2	$0.500	$2.00	$0.875
llama-4-maverick-instruct_fp8	$0.150	$0.600	$0.263

Review scores

Top pricing complaints

Limited access to popular closed-source models (no Claude, GPT-4, Gemini)Quality issues with non-FP8 quantized model variants70B+ model pricing less competitive vs. alternatives like Gemini Flash for multimodal tasks

Source: Artificial Analysis — medians aggregated from 93 models in this provider's catalog. Per-1M-token pricing reflects list rates.

DeepInfra Year 1 Total Cost by Company Size

Real deployment costs including licenses, implementation, training, and admin — not just the sticker price.

Budget Developer / Experimenter $500 Year 1 total

Individual developer running experiments and low-volume tests using small models (e.g., Llama 3.1 8B or Gemma 3 4B). A Reddit user calculated that at DeepInfra's pricing for Mixtral-class models, 18.5 million queries could be served for $500.

Small SaaS App (8B Model, Moderate Volume) ~$30/year at 50M output tokens/month Year 1 total

Production app using Llama 3.1 8B at $0.02/$0.05 per 1M input/output tokens (per Artificial Analysis). At 50M output tokens per month, estimated monthly cost is ~$2.50, or ~$30/year.

Production SaaS at Scale (Mixed 70B-Class Models) ~$36,000/year at 10B tokens/month Year 1 total

High-volume production workload running millions of inference jobs using 70B-class models. At the provider median blended rate of $0.30/1M tokens and 10 billion tokens consumed per month, estimated monthly cost is ~$3,000.

Reasoning-Heavy Workload (DeepSeek R1) ~$2,880/year at 100M output tokens/month Year 1 total

App using DeepSeek R1 for complex multi-step reasoning tasks. At $0.70/$2.40 per 1M input/output tokens, a workload generating 100M output tokens per month costs ~$240/month.

Reddit (r/LocalLLaMA, 2023-12-26)

How DeepInfra Pricing Compares

Software	Starting Price	Top Price
DeepInfra	$0.02/per million tokens	$82.5/per million tokens
Amazon Bedrock	$0.07/per million tokens	$75/per million tokens
Anyscale	$0.15/per million tokens	$5/per million tokens
Baidu ERNIE API	$0.1/per million tokens	$10/per million tokens
Cerebras Inference API	$0.1/per million tokens	$6/per million tokens
Claude API	$0.03/per million tokens	$75/per million tokens

Detailed pricing comparisons:

Browse all LLM API Providers pricing →

4 DeepInfra Hidden Costs Beyond the List Price

Beyond the listed price, DeepInfra has at least 4 documented hidden costs that can significantly increase total cost of ownership.

Watch for 4 hidden costs

Model Size Premium: Large Models Cost Significantly More $0.02-$4.40
medium 2 sources

Reddit "DeepInfra's pricing for 7-8 B params model is veeeryy cheap, but for 70B it is expensive"
Hacker News "deepinfra has wizardLM-2-8x22B at $0.65/1M output tokens, compared to $6/1M output tokens for 8x22B by Mistral"
Third-Party Marketplace Markup 5-15% of license costs
low 1 source

Reddit "Yeah, I would advice avoid openrouter chat. Just use it for comparison, then go to the provider's own website and use it for chat. Deepinfra and Nebius are the cheapest options at $2.4, with stable pricing."
Quantization Compatibility: Non-FP8 Models May Produce Unreliable Output 5-20% of license costs
medium 1 source

Reddit "I only use DeepSeek and DeepInfra, As they're the cheapest but they're also the most reliable, I block the others as they are up there in their pricing but I notice that unless it's running FP8 it doesn't function correctly."
Limited Closed-Source Model Access Requires Supplemental Providers 5-20% of license costs
medium 1 source

Reddit "I'm currently using deepinfra, but they lack the more popular and powerful models."

Tip

Ask your DeepInfra sales rep about these costs upfront. Getting them in writing before signing can save you from surprise charges later.

Full hidden costs breakdown →

Intelligence sourced from 2 independent sources

Reddit User discussions Hacker News Tech community

Key claims include inline source attribution. Data verified against multiple independent sources. 14 source citations total.

DeepInfra Contract Terms

DeepInfra contracts do not auto-renew. Changes require No contract — pay-as-you-go, stop usage anytime. These terms are sourced from verified buyer experiences.

Contract Terms

Auto-Renewal No

Cancellation Notice No contract — pay-as-you-go, stop usage anytime

Minimum Commitment None

Mid-Term Downgrade Allowed

Payment Terms Pay-as-you-go, billed per token consumed

Price Escalation No published schedule; per-token prices have generally decreased over time as the inference market has become more competitive

Note

No downgrade process needed — usage scales down automatically by reducing API call volume

Based on 2 verified sources

How to Negotiate DeepInfra Pricing

DeepInfra contracts are negotiable. These 6 tactics are sourced from real buyer experiences and procurement specialists.

Negotiation Playbook 6 tactics

Access DeepInfra Directly — Skip OpenRouter Markup high success

Use DeepInfra's native API at deepinfra.com rather than through OpenRouter or similar aggregators. OpenRouter adds its own margin on top of the provider's base token prices. For cost-sensitive production workloads, going direct is the single easiest way to reduce per-token spend.

Reddit (r/DeepSeek, 2025-02-01)

Right-Size Your Model Before Production Commit high success

DeepInfra's cheapest models start at $0.01-$0.06/1M blended tokens (e.g., Qwen3 0.8B, Gemma 3 4B), while 70B reasoning models can reach $1-$4/1M. Test your use case with smaller models first — many production tasks perform acceptably at a fraction of the cost of a flagship model.

Artificial Analysis (artificialanalysis.ai), 2026-04-23

Use Artificial Analysis to Find Price-Quality Arbitrage high success

Before choosing a model, review artificialanalysis.ai/providers/deepinfra which tracks all 93+ models by price, quality score, and throughput. Models with similar quality benchmarks can vary 10-50x in price — identifying the best value model for your task before integrating can yield large savings at scale.

HN (2025-07-17)

Use Batch/Scheduled Workloads to Maximize Token Throughput high success

Design AI systems to run models on a schedule and serve static or periodically updated datasets rather than real-time inference. This pattern fully utilizes DeepInfra's low per-token rates without paying latency premiums, and is an order of magnitude cheaper than real-time API patterns at low volumes.

Reddit (r/LocalLLaMA, 2025-03-01)

Compare Open-Source Alternatives to Closed-Source APIs high success

DeepInfra users report using DeepSeek R1 at $0.75/$2.40 per 1M in/out tokens vs. OpenAI at $1.10/$4.40 — roughly 8x cheaper. For workloads where open-source model quality is sufficient, switching from closed-source APIs to DeepInfra's open-source equivalents yields dramatic cost reductions.

Reddit (r/singularity, 2025-02-01)

Mandate FP8 Quantization Variants for Production medium success

Explicitly select FP8-quantized model variants (e.g., model_name_fp8) rather than relying on defaults. Source data suggests non-FP8 variants may produce unreliable outputs, which wastes compute on failed or low-quality generations that need to be re-run.

Reddit (r/Chub_AI, 2025-04-05)

Full negotiation guide →

DeepInfra Pricing FAQ

01 How much does DeepInfra cost?

DeepInfra charges per token, starting from $0.06 per million tokens for small models like Llama 3.1 8B and Mistral 7B. Larger models like DeepSeek-R1 cost $0.55/M input and $2.19/M output tokens. A $5 free credit is provided on sign-up.

02 Does DeepInfra have a free tier?

DeepInfra gives $5 in free credits upon sign-up with no credit card required. After the credits are used, you pay standard per-token rates.

03 What models are available on DeepInfra?

DeepInfra hosts 50+ models including Llama 3.x (8B to 405B), Mistral, DeepSeek R1/V3, Qwen 2.5, Gemma 3, Flux image generation, and Whisper speech-to-text.

04 Is DeepInfra OpenAI-compatible?

Yes, DeepInfra provides an OpenAI-compatible REST API. You can use it as a drop-in replacement by changing the base URL to api.deepinfra.com/v1/openai while keeping the same OpenAI SDK.

05 DeepInfra vs Together AI: which is cheaper?

DeepInfra and Together AI are closely priced for most models. DeepInfra tends to be slightly cheaper for small models (Llama 8B at $0.06/M vs Together AI at $0.10-0.18/M). Both are significantly cheaper than OpenAI or Anthropic for equivalent open-source model quality.

06 Is DeepInfra cheaper than OpenAI?

For open-source models, yes — significantly. One user compared using DeepSeek via DeepInfra at $0.75/$2.40 per 1M input/output tokens against OpenAI at $1.10/$4.40, concluding OpenAI was roughly 8x more expensive. DeepInfra does not offer closed-source models like GPT-4o or Claude, so direct comparison depends on whether an open-source equivalent meets your quality bar.

07 Should I use DeepInfra directly or through OpenRouter?

Direct access is cheaper. OpenRouter adds its own fees on top of DeepInfra's native token prices. For cost-sensitive production workloads, integrating with DeepInfra's API directly avoids the intermediary markup. OpenRouter remains useful for comparing providers before committing.

08 Which DeepInfra models offer the best price-to-performance ratio?

Based on Artificial Analysis data, DeepSeek V3 (2025) offers strong capability at $0.21/$0.32 per 1M input/output tokens. For lightweight tasks, Gemma 3 12B ($0.04/$0.13) and Qwen3 0.8B ($0.01/$0.05) are extremely cost-effective. The provider median blended rate across all 93 models is $0.30/1M tokens. Use Artificial Analysis to compare quality benchmarks against price before selecting a model.

Is this pricing incorrect? — we'll verify and update it.