DeepInfra Pricing 2026
Complete pricing guide with plans, hidden costs, and cost analysis
DeepInfra pricing ranges from $0.02 to $82.50/per million tokens.
DeepInfra costs $0.02 to $82.50 per per million tokens as of April 2026. Pricing depends on your chosen tier, contract length, and negotiated discounts.
Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.
- Free tier: No free tier available
DeepInfra offers 1 pricing tiers: Pay-as-you-go. The Pay-as-you-go plan is developers needing affordable inference for open-source and commercial models in production.
Compared to other llm api providers software, DeepInfra is positioned at the budget-friendly price point.
- 4 documented hidden costs beyond list price
How much does DeepInfra cost?
DeepInfra Pricing Overview
DeepInfra has 1 pricing plans ranging from $0.02 to $82.50/per million tokens. The Pay-as-you-go plan requires contacting sales for a custom quote and is designed for developers needing affordable inference for open-source and commercial models in production.
DeepInfra with a None minimum commitment, requiring No contract — pay-as-you-go, stop usage anytime notice to cancel.
There are at least 4 documented hidden costs beyond DeepInfra's list price, including implementation, training, and add-on fees.
This pricing was last verified in April 15, 2026 from 2 independent sources.
DeepInfra is a serverless AI inference platform specializing in open-source model hosting. It offers OpenAI-compatible APIs for 50+ models including Llama 3.x, Mistral, DeepSeek, and Qwen at highly competitive per-token rates. Pricing starts at $0.06 per million tokens for small models, with no minimum commitments or setup fees. DeepInfra is popular among developers looking for the cheapest way to run large open-source models in production without managing GPU infrastructure.
How DeepInfra Pricing Compares
Compare DeepInfra pricing against top alternatives in LLM API Providers.
All DeepInfra Plans & Pricing
| Plan | Monthly | Annual | Best For |
|---|---|---|---|
| Pay-as-you-go concurrentRequests: 200 per account | Contact Sales | Contact Sales | Developers needing affordable inference for open-source and commercial models in production |
View all features by plan
Pay-as-you-go
- 70+ models including open-source and commercial
- OpenAI-compatible API
- No minimum spend
- DeepSeek, Llama 4, Qwen3, Gemini, Claude, Gemma, Mistral support
- Image generation (Flux family)
- Speech-to-text (Voxtral)
- Embedding models
- Dedicated GPU deployment option (A100, H100, H200, B200, B300)
- Usage-based billing tiers with automatic tier progression
Usage-Based Rates
Per-unit pricing for DeepInfra API usage.
Pay-as-you-go
| Model | Unit | Rate |
|---|---|---|
| DeepSeek-V3.2 | 1M input tokens | $0.260 |
| DeepSeek-V3.2 | 1M output tokens | $0.380 |
| DeepSeek-R1-0528 | 1M input tokens | $0.500 |
| DeepSeek-R1-0528 | 1M output tokens | $2.15 |
| Llama 4 Maverick 17B-128E | 1M input tokens | $0.150 |
| Llama 4 Maverick 17B-128E | 1M output tokens | $0.600 |
| Llama 3.3 70B Instruct Turbo | 1M input tokens | $0.100 |
| Llama 3.3 70B Instruct Turbo | 1M output tokens | $0.320 |
| Meta-Llama-3.1-8B-Instruct | 1M input tokens | $0.020 |
| Meta-Llama-3.1-8B-Instruct | 1M output tokens | $0.050 |
| Qwen3-235B-A22B-Instruct-2507 | 1M input tokens | $0.071 |
| Qwen3-235B-A22B-Instruct-2507 | 1M output tokens | $0.100 |
| Qwen2.5-72B-Instruct | 1M input tokens | $0.120 |
| Qwen2.5-72B-Instruct | 1M output tokens | $0.390 |
| Gemma 3 27B | 1M input tokens | $0.080 |
| Gemma 3 27B | 1M output tokens | $0.160 |
| Mistral Small 3.2 24B | 1M input tokens | $0.075 |
| Mistral Small 3.2 24B | 1M output tokens | $0.200 |
| Gemini 2.5 Flash | 1M input tokens | $0.300 |
| Gemini 2.5 Flash | 1M output tokens | $2.50 |
| Voxtral Small 24B | minute | $0.00300 |
| FLUX-2-pro | image | $0.015 |
- Cached input tokens available at reduced rates on select models (DeepSeek, Qwen3)
- Input and output tokens often same price on small models
- Verify current rates at deepinfra.com/pricing — model catalog updated frequently
Compare DeepInfra vs Alternatives
Before committing to DeepInfra, compare pricing with these 3 alternatives in the same category.
What Companies Actually Pay for DeepInfra
| Model | Input /1M | Output /1M | Blended /1M |
|---|---|---|---|
| deepseek-r1 | $0.700 | $2.40 | $1.13 |
| deepseek-v3-2 | $0.210 | $0.320 | $0.237 |
| deepseek-r1-05-28 | $0.500 | $2.15 | $0.912 |
| kimi-k2 | $0.500 | $2.00 | $0.875 |
| llama-4-maverick-instruct_fp8 | $0.150 | $0.600 | $0.263 |
DeepInfra Year 1 Total Cost by Company Size
Real deployment costs including licenses, implementation, training, and admin — not just the sticker price.
Individual developer running experiments and low-volume tests using small models (e.g., Llama 3.1 8B or Gemma 3 4B). A Reddit user calculated that at DeepInfra's pricing for Mixtral-class models, 18.5 million queries could be served for $500.
Production app using Llama 3.1 8B at $0.02/$0.05 per 1M input/output tokens (per Artificial Analysis). At 50M output tokens per month, estimated monthly cost is ~$2.50, or ~$30/year.
High-volume production workload running millions of inference jobs using 70B-class models. At the provider median blended rate of $0.30/1M tokens and 10 billion tokens consumed per month, estimated monthly cost is ~$3,000.
App using DeepSeek R1 for complex multi-step reasoning tasks. At $0.70/$2.40 per 1M input/output tokens, a workload generating 100M output tokens per month costs ~$240/month.
Reddit (r/LocalLLaMA, 2023-12-26)
How DeepInfra Pricing Compares
| Software | Starting Price | Top Price |
|---|---|---|
| DeepInfra | $0.02/per million tokens | $82.5/per million tokens |
| Amazon Bedrock | $0.07/per million tokens | $75/per million tokens |
| Anyscale | $0.15/per million tokens | $5/per million tokens |
| Baidu ERNIE API | $0.1/per million tokens | $10/per million tokens |
| Cerebras Inference API | $0.1/per million tokens | $6/per million tokens |
| Claude API | $0.03/per million tokens | $75/per million tokens |
Detailed pricing comparisons:
DeepInfra Contract Terms
DeepInfra contracts do not auto-renew. Changes require No contract — pay-as-you-go, stop usage anytime. These terms are sourced from verified buyer experiences.
No downgrade process needed — usage scales down automatically by reducing API call volume
How to Negotiate DeepInfra Pricing
DeepInfra contracts are negotiable. These 6 tactics are sourced from real buyer experiences and procurement specialists.
Use DeepInfra's native API at deepinfra.com rather than through OpenRouter or similar aggregators. OpenRouter adds its own margin on top of the provider's base token prices. For cost-sensitive production workloads, going direct is the single easiest way to reduce per-token spend.
Reddit (r/DeepSeek, 2025-02-01)DeepInfra's cheapest models start at $0.01-$0.06/1M blended tokens (e.g., Qwen3 0.8B, Gemma 3 4B), while 70B reasoning models can reach $1-$4/1M. Test your use case with smaller models first — many production tasks perform acceptably at a fraction of the cost of a flagship model.
Artificial Analysis (artificialanalysis.ai), 2026-04-23Before choosing a model, review artificialanalysis.ai/providers/deepinfra which tracks all 93+ models by price, quality score, and throughput. Models with similar quality benchmarks can vary 10-50x in price — identifying the best value model for your task before integrating can yield large savings at scale.
HN (2025-07-17)Design AI systems to run models on a schedule and serve static or periodically updated datasets rather than real-time inference. This pattern fully utilizes DeepInfra's low per-token rates without paying latency premiums, and is an order of magnitude cheaper than real-time API patterns at low volumes.
Reddit (r/LocalLLaMA, 2025-03-01)DeepInfra users report using DeepSeek R1 at $0.75/$2.40 per 1M in/out tokens vs. OpenAI at $1.10/$4.40 — roughly 8x cheaper. For workloads where open-source model quality is sufficient, switching from closed-source APIs to DeepInfra's open-source equivalents yields dramatic cost reductions.
Reddit (r/singularity, 2025-02-01)Explicitly select FP8-quantized model variants (e.g., model_name_fp8) rather than relying on defaults. Source data suggests non-FP8 variants may produce unreliable outputs, which wastes compute on failed or low-quality generations that need to be re-run.
Reddit (r/Chub_AI, 2025-04-05)DeepInfra Pricing FAQ
01 How much does DeepInfra cost?
DeepInfra charges per token, starting from $0.06 per million tokens for small models like Llama 3.1 8B and Mistral 7B. Larger models like DeepSeek-R1 cost $0.55/M input and $2.19/M output tokens. A $5 free credit is provided on sign-up.
02 Does DeepInfra have a free tier?
DeepInfra gives $5 in free credits upon sign-up with no credit card required. After the credits are used, you pay standard per-token rates.
03 What models are available on DeepInfra?
DeepInfra hosts 50+ models including Llama 3.x (8B to 405B), Mistral, DeepSeek R1/V3, Qwen 2.5, Gemma 3, Flux image generation, and Whisper speech-to-text.
04 Is DeepInfra OpenAI-compatible?
Yes, DeepInfra provides an OpenAI-compatible REST API. You can use it as a drop-in replacement by changing the base URL to api.deepinfra.com/v1/openai while keeping the same OpenAI SDK.
05 DeepInfra vs Together AI: which is cheaper?
DeepInfra and Together AI are closely priced for most models. DeepInfra tends to be slightly cheaper for small models (Llama 8B at $0.06/M vs Together AI at $0.10-0.18/M). Both are significantly cheaper than OpenAI or Anthropic for equivalent open-source model quality.
06 Is DeepInfra cheaper than OpenAI?
For open-source models, yes — significantly. One user compared using DeepSeek via DeepInfra at $0.75/$2.40 per 1M input/output tokens against OpenAI at $1.10/$4.40, concluding OpenAI was roughly 8x more expensive. DeepInfra does not offer closed-source models like GPT-4o or Claude, so direct comparison depends on whether an open-source equivalent meets your quality bar.
07 Should I use DeepInfra directly or through OpenRouter?
Direct access is cheaper. OpenRouter adds its own fees on top of DeepInfra's native token prices. For cost-sensitive production workloads, integrating with DeepInfra's API directly avoids the intermediary markup. OpenRouter remains useful for comparing providers before committing.
08 Which DeepInfra models offer the best price-to-performance ratio?
Based on Artificial Analysis data, DeepSeek V3 (2025) offers strong capability at $0.21/$0.32 per 1M input/output tokens. For lightweight tasks, Gemma 3 12B ($0.04/$0.13) and Qwen3 0.8B ($0.01/$0.05) are extremely cost-effective. The provider median blended rate across all 93 models is $0.30/1M tokens. Use Artificial Analysis to compare quality benchmarks against price before selecting a model.
Is this pricing incorrect? — we'll verify and update it.