Fireworks AI Pricing 2026
Complete pricing guide with plans, and cost analysis
Fireworks AI pricing ranges from $0 to $9/per million tokens / hour.
Fireworks AI costs Free to $9 per per million tokens / hour as of April 2026, with 5 plans available. Pricing depends on your chosen tier, contract length, and negotiated discounts.
Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.
- Free tier: No free tier available
Fireworks AI offers 5 pricing tiers: Serverless, On-Demand (A100), On-Demand (H100/H200), On-Demand (B200), Enterprise. The On-Demand (A100) plan is consistent inference workloads.
Compared to other llm api providers software, Fireworks AI is positioned at the budget-friendly price point.
How much does Fireworks AI cost?
Fireworks AI Pricing Overview
Fireworks AI has 5 pricing plans ranging from $0 to $9/per million tokens / hour. The Serverless plan requires contacting sales for a custom quote and is designed for variable-volume api usage. The On-Demand (A100) plan requires contacting sales for a custom quote and is designed for consistent inference workloads. The On-Demand (H100/H200) plan requires contacting sales for a custom quote and is designed for large model hosting. The On-Demand (B200) plan requires contacting sales for a custom quote and is designed for cutting-edge performance. The Enterprise plan requires contacting sales for a custom quote and is designed for large-scale enterprise deployments.
This pricing was last verified in April 1, 2026.
All Fireworks AI Plans & Pricing
| Plan | Monthly | Annual | Best For |
|---|---|---|---|
| Serverless | Contact Sales | Contact Sales | Variable-volume API usage |
| On-Demand (A100) | Contact Sales | Contact Sales | Consistent inference workloads |
| On-Demand (H100/H200) | Contact Sales | Contact Sales | Large model hosting |
| On-Demand (B200) | Contact Sales | Contact Sales | Cutting-edge performance |
| Enterprise | Contact Sales | Contact Sales | Large-scale enterprise deployments |
View all features by plan
Serverless
- $1 free credits to start
- Models <4B at $0.10/M tokens
- Models 4B-16B at $0.20/M tokens
- Models >16B at $0.90/M tokens
- MoE models at $0.50-$1.20/M tokens
- Cached input tokens at 50% price
- Batch inference at 50% discount
- Embeddings from $0.008/M tokens
On-Demand (A100)
- A100 80GB at $2.90/hr
- Dedicated model hosting
- Custom fine-tuned models
On-Demand (H100/H200)
- H100/H200 at $6.00/hr
- High-performance inference
- Dedicated resources
On-Demand (B200)
- B200 at $9.00/hr
- Latest generation hardware
- Maximum throughput
Enterprise
- Volume discounts
- Dedicated support
- Custom SLAs
Usage-Based Rates
Per-unit pricing for Fireworks AI API usage.
Serverless
| Model | Unit | Rate |
|---|---|---|
| Models <4B params | 1M input tokens | $0.1 |
| Models <4B params | 1M output tokens | $0.1 |
| Models 4B–16B params | 1M input tokens | $0.2 |
| Models 4B–16B params | 1M output tokens | $0.2 |
| Models 16B–80B params | 1M input tokens | $0.9 |
| Models 16B–80B params | 1M output tokens | $0.9 |
| Models >80B params (MoE) | 1M input tokens | $0.9 |
| Models >80B params (MoE) | 1M output tokens | $0.9 |
| DeepSeek R1 (671B) | 1M input tokens | $3 |
| DeepSeek R1 (671B) | 1M output tokens | $8 |
| FLUX.1 [schnell] | image | $0.004 |
| FLUX.1 [dev] | image | $0.025 |
- Pricing by model parameter size tier
- $1 in free credits on signup
- Image generation billed per image
On-Demand (A100)
| Model | Unit | Rate |
|---|---|---|
| A100 80GB | second | $0.000806 |
- $2.90/hour per A100 GPU
On-Demand (H100/H200)
| Model | Unit | Rate |
|---|---|---|
| H100/H200 | second | $0.00167 |
- $6.00/hour per H100/H200 GPU
On-Demand (B200)
| Model | Unit | Rate |
|---|---|---|
| B200 | second | $0.0025 |
- $9.00/hour per B200 GPU
How Fireworks AI Pricing Compares
| Software | Starting Price | Top Price |
|---|---|---|
| Fireworks AI | Free | $9/per million tokens / hour |
| Groq | Free | $3/per million tokens |
| Together AI | $0.03/per million tokens / hour | $9.95/per million tokens / hour |
| Google Gemini API | Free | $18/per million tokens |
| Mistral AI API | $0.1/per million tokens | $6/per million tokens |
| Perplexity API | $1/per million tokens + per-request fee | $15/per million tokens + per-request fee |
Detailed pricing comparisons:
Fireworks AI Pricing FAQ
01 How much does Fireworks AI cost?
Fireworks AI serverless pricing starts at $0.10 per million tokens for small models (<4B parameters) and goes up to $0.90/M for models over 16B. On-demand GPU deployments range from $2.90/hr (A100) to $9.00/hr (B200). New accounts get $1 in free credits.
02 Does Fireworks AI have a free tier?
Fireworks AI offers $1 in free credits for new accounts. After that, pricing is pay-as-you-go with no minimum commitment. Batch inference and cached input tokens each offer 50% discounts, reducing ongoing costs.
03 How does Fireworks AI fine-tuning work?
Fireworks AI supports fine-tuning with SFT and DPO methods. Pricing ranges from $0.50/M training tokens for models under 16B to $10–20/M tokens for models over 300B. Fine-tuned models can be deployed on Serverless or dedicated infrastructure.
04 Fireworks AI vs Together AI: which should I choose?
Both offer serverless inference starting at $0.10/M tokens. Fireworks AI provides $1 free credits upfront and offers A100 On-Demand at $2.90/hr, while Together AI's comparable H100 dedicated is $3.99/hr. Fireworks AI is generally slightly cheaper for dedicated GPU hosting and offers batch discounts of 50%.
05 What is Fireworks AI On-Demand pricing?
Fireworks AI On-Demand GPU deployments are priced at $2.90/hr for A100 80GB, $6.00/hr for H100/H200, and $9.00/hr for B200. These are dedicated single-tenant deployments ideal for hosting custom fine-tuned models or maintaining consistent inference capacity.
Is this pricing incorrect? — we'll verify and update it.