Fireworks AI Pricing 2026: Free-$9/per million tokens / hour

Q: Does Fireworks AI have a free tier?

Fireworks AI offers $1 in free credits for new accounts. After that, pricing is pay-as-you-go with no minimum commitment. Batch inference and cached input tokens each offer 50% discounts, reducing ongoing costs.

Q: How does Fireworks AI fine-tuning work?

Fireworks AI supports fine-tuning with SFT and DPO methods. Pricing ranges from $0.50/M training tokens for models under 16B to $10–20/M tokens for models over 300B. Fine-tuned models can be deployed on Serverless or dedicated infrastructure.

Q: Fireworks AI vs Together AI: which should I choose?

Both offer serverless inference starting at $0.10/M tokens. Fireworks AI provides $1 free credits upfront and offers A100 On-Demand at $2.90/hr, while Together AI's comparable H100 dedicated is $3.99/hr. Fireworks AI is generally slightly cheaper for dedicated GPU hosting and offers batch discounts of 50%.

Q: What is Fireworks AI On-Demand pricing?

Fireworks AI On-Demand GPU deployments are priced at $2.90/hr for A100 80GB, $6.00/hr for H100/H200, and $9.00/hr for B200. These are dedicated single-tenant deployments ideal for hosting custom fine-tuned models or maintaining consistent inference capacity.

Q: Is Fireworks AI cheaper than going directly to model providers like DeepSeek?

Not always. Community comparisons from early 2025 noted DeepSeek R1 costing $8/1M output tokens on Fireworks Serverless versus $2.19/1M output tokens directly from DeepSeek. However, Fireworks pricing evolves frequently — the Artificial Analysis April 2026 benchmark shows the provider median at $1.68/1M output tokens across 16 tracked models. For high-volume single-model workloads, always compare current rates against direct provider APIs before committing.

Q: What GPU options are available on Fireworks AI's On-Demand tier?

Fireworks AI offers three On-Demand GPU tiers: A100, H100/H200, and B200. All are custom-priced based on your requirements. The Enterprise tier adds dedicated infrastructure, SLA guarantees, and additional support. Contact Fireworks AI sales for specific GPU-hour pricing at your expected usage level.

Q: Can I fine-tune models on Fireworks AI?

Fine-tuning has limitations. Community users have noted that MoE (Mixture of Experts) models over 176B parameters cannot be fine-tuned on the Serverless tier. Teams requiring fine-tuning of large MoE models need to use On-Demand or Enterprise tiers, which carry custom pricing.

Q: What is the median cost per million tokens on Fireworks AI?

According to Artificial Analysis data from April 2026, Fireworks AI's median blended rate across 16 tracked models is $0.84 per 1M tokens, with a median input rate of $0.53/1M and median output rate of $1.68/1M. Individual model prices range from $0.20/1M blended (Qwen3-8B) to $2.15/1M blended (GLM-5-1).

Price checkPer per million tokens

ServerlessCustom On-Demand (H100/H200)Custom On-Demand (B200)Custom

See all 5 plans

Quick Answer

Last verified: May 3, 2026

Estimate

Fireworks AI costs Free to $11 per per million tokens / hour as of May 2026, with 5 plans available. Pricing depends on your chosen tier, contract length, and negotiated discounts.

Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.

Free tier: No free tier available

Fireworks AI offers 5 pricing tiers: Serverless, On-Demand (H100/H200), On-Demand (B200), On-Demand (B300), Enterprise. The On-Demand (H100/H200) plan is consistent inference workloads.

Compared to other llm api providers software, Fireworks AI is positioned at the budget-friendly price point.

2 documented hidden costs beyond list price

How much does Fireworks AI cost?

Fireworks AI pricing starts at $0/per million tokens / hour across 5 plans, with enterprise pricing available on request. Plans include Serverless (custom pricing), On-Demand (H100/H200) (custom pricing), On-Demand (B200) (custom pricing), On-Demand (B300) (custom pricing), Enterprise (custom pricing).

Fireworks AI Pricing Overview

Fireworks AI has 5 pricing plans ranging from $0 to $11/per million tokens / hour. The Serverless plan requires contacting sales for a custom quote and is designed for variable-volume api usage. The On-Demand (H100/H200) plan requires contacting sales for a custom quote and is designed for consistent inference workloads. The On-Demand (B200) plan requires contacting sales for a custom quote and is designed for cutting-edge performance. The On-Demand (B300) plan requires contacting sales for a custom quote and is designed for largest models requiring maximum vram. The Enterprise plan requires contacting sales for a custom quote and is designed for large-scale enterprise deployments.

There are at least 2 documented hidden costs beyond Fireworks AI's list price, including implementation, training, and add-on fees.

This pricing was last verified in May 3, 2026 from 1 independent sources.

See Fireworks AI Plans

Fireworks AI is an LLM inference platform providing access to 16+ open-source models through a unified API across five pricing tiers: Serverless (pay-per-token), On-Demand dedicated GPU instances on A100, H100/H200, or B200 hardware, and an Enterprise tier for large-scale deployments — all custom-quoted. According to Artificial Analysis (April 2026), the platform's median blended rate across tracked models is $0.84 per 1M tokens ($0.53 input / $1.68 output). There is no published free tier.

How Fireworks AI Pricing Compares

Compare Fireworks AI pricing against top alternatives in LLM API Providers.

Groq $0-$3.0/per million tokens Compare → Together AI $0.03-$9.95/per million tokens / hour Compare → Google Gemini API $0-$18.0/per million tokens Compare →

All Fireworks AI Plans & Pricing

Plan	Monthly	Annual	Best For
Serverless	Contact Sales	Contact Sales	Variable-volume API usage
What's included at Serverless Best for: Variable-volume API usage $1 free credits to start Models <4B at $0.10/M tokens Models 4B-16B at $0.20/M tokens Models >16B at $0.90/M tokens MoE 0-56B at $0.50/M tokens MoE 56.1-176B at $1.20/M tokens DeepSeek V4 Pro at $1.74 input / $3.48 output Kimi K2.6 at $0.95 input / $4.00 output GLM-5 at $1.00 input / $3.20 output Cached input tokens at 50% price Batch inference at 50% discount Embeddings from $0.008/M tokens
On-Demand (H100/H200)	Contact Sales	Contact Sales	Consistent inference workloads
What's included at On-Demand (H100/H200) Best for: Consistent inference workloads H100 80GB at $6.00/hr (rising to $7.00/hr May 1, 2026) H200 141GB at $6.00/hr (rising to $7.00/hr May 1, 2026) Dedicated model hosting Custom fine-tuned models Pay per GPU second
On-Demand (B200)	Contact Sales	Contact Sales	Cutting-edge performance
What's included at On-Demand (B200) Best for: Cutting-edge performance B200 180GB at $9.00/hr (rising to $10.00/hr May 1, 2026) Latest generation hardware Maximum throughput Pay per GPU second
On-Demand (B300)	Contact Sales	Contact Sales	Largest models requiring maximum VRAM
What's included at On-Demand (B300) Best for: Largest models requiring maximum VRAM B300 288GB at $11.00/hr (rising to $12.00/hr May 1, 2026) Highest-memory GPU option Largest model hosting Pay per GPU second
Enterprise	Contact Sales	Contact Sales	Large-scale enterprise deployments
What's included at Enterprise Best for: Large-scale enterprise deployments Volume discounts Dedicated support Custom SLAs Faster speeds and higher rate limits

View all features by plan (compare side-by-side)

Serverless

$1 free credits to start
Models <4B at $0.10/M tokens
Models 4B-16B at $0.20/M tokens
Models >16B at $0.90/M tokens
MoE 0-56B at $0.50/M tokens
MoE 56.1-176B at $1.20/M tokens
DeepSeek V4 Pro at $1.74 input / $3.48 output
Kimi K2.6 at $0.95 input / $4.00 output
GLM-5 at $1.00 input / $3.20 output
Cached input tokens at 50% price
Batch inference at 50% discount
Embeddings from $0.008/M tokens

On-Demand (H100/H200)

H100 80GB at $6.00/hr (rising to $7.00/hr May 1, 2026)
H200 141GB at $6.00/hr (rising to $7.00/hr May 1, 2026)
Dedicated model hosting
Custom fine-tuned models
Pay per GPU second

On-Demand (B200)

B200 180GB at $9.00/hr (rising to $10.00/hr May 1, 2026)
Latest generation hardware
Maximum throughput
Pay per GPU second

On-Demand (B300)

B300 288GB at $11.00/hr (rising to $12.00/hr May 1, 2026)
Highest-memory GPU option
Largest model hosting
Pay per GPU second

Enterprise

Volume discounts
Dedicated support
Custom SLAs
Faster speeds and higher rate limits

See Fireworks AI Plans

Compare Fireworks AI with alternativesAdjust seats, lock a tier, add up to 2 more products side-by-side. Shareable URL.

Usage-Based Rates

Per-unit pricing for Fireworks AI API usage.

Serverless

Model	Input	Output	Cached	Per
models-under-4b	$0.100	$0.100	—	1M tokens
models-4b-16b	$0.200	$0.200	—	1M tokens
models-over-16b	$0.900	$0.900	—	1M tokens
moe-0-56b	$0.500	$0.500	—	1M tokens
moe-56-176b	$1.20	$1.20	—	1M tokens
deepseek-v4-pro	$1.74	$3.48	$0.145	1M tokens
deepseek-v3	$0.560	$1.68	—	1M tokens
kimi-k2-6	$0.950	$4.00	$0.160	1M tokens
kimi-k2-6-priority	$1.50	$6.00	$0.220	1M tokens
kimi-k2-6-turbo	$2.00	$8.00	$0.300	1M tokens
kimi-k2-5	$0.600	$3.00	$0.100	1M tokens
kimi-k2-5-turbo	$0.990	$4.94	$0.160	1M tokens
glm-4-7	$0.600	$2.20	—	1M tokens
glm-5	$1.00	$3.20	$0.200	1M tokens
glm-5-1	$1.40	$4.40	$0.260	1M tokens
qwen3-vl-30b-a3b	$0.150	$0.600	—	1M tokens
gpt-oss-120b	$0.150	$0.600	—	1M tokens
gpt-oss-20b	$0.070	$0.300	—	1M tokens
minimax-2-5	$0.300	$1.20	$0.030	1M tokens
minimax-2-7	$0.300	$1.20	$0.060	1M tokens

Item	Dimension	Unit	Rate
embeddings-up-to-150m	embedding	1M tokens	$0.00800
embeddings-150m-350m	embedding	1M tokens	$0.016
qwen3-8b-embeddings	embedding	1M tokens	$0.100

Pricing by model parameter size tier for general open models
Specific pricing for major models (DeepSeek, Kimi, GLM, MiniMax, GPT-OSS)
Cached input tokens at 50% of input price unless specified
Batch inference at 50% discount
$1 in free credits on signup

On-Demand (H100/H200)

Model	Unit	Rate
h100-80gb	hour	$6.00
h200-141gb	hour	$6.00

$6.00/hour per H100 80GB or H200 141GB GPU through Apr 30, 2026
Rising to $7.00/hour from May 1, 2026

On-Demand (B200)

Model	Unit	Rate
b200-180gb	hour	$9.00

$9.00/hour per B200 180GB GPU through Apr 30, 2026
Rising to $10.00/hour from May 1, 2026

On-Demand (B300)

Model	Unit	Rate
b300-288gb	hour	$11.00

$11.00/hour per B300 288GB GPU through Apr 30, 2026
Rising to $12.00/hour from May 1, 2026

Compare Fireworks AI vs Alternatives

Before committing to Fireworks AI, compare pricing with these 3 alternatives in the same category.

VSGroq

Free

Prototyping and evaluation

Full comparison

VSTogether AI

From $0.03/per million tokens / hour

Variable-volume API usage

Full comparison

VSGoogle Gemini API

Free

Prototyping and evaluation

Full comparison

All Fireworks AI alternatives & migration guides

What Companies Actually Pay for Fireworks AI

Median per-1M-token pricing across 16 models

Input $0.530/1M

Output $1.68/1M

Flagship models in this provider's catalog

Model	Input /1M	Output /1M	Blended /1M
fireworks_deepseek-v3-2	$0.560	$1.68	$0.840
fireworks_kimi-k2-6	$0.950	$4.00	$1.71
fireworks_llama-3-3-instruct-70b	$0.900	$0.900	$0.900
fireworks_minimax-m2-7	$0.300	$1.20	$0.525
fireworks_qwen3-8b-instruct	$0.200	$0.200	$0.200

Review scores

Top pricing complaints

Serverless pricing has historically been higher than going directly to underlying model providers for single-model workloadsCannot fine-tune large MoE models (over 176B parameters) on the Serverless tier

Source: Artificial Analysis — medians aggregated from 16 models in this provider's catalog. Per-1M-token pricing reflects list rates.

How Fireworks AI Pricing Compares

Software	Starting Price	Top Price
Fireworks AI	Free	$11/per million tokens / hour
Amazon Bedrock	$0.07/per million tokens	$75/per million tokens
Anyscale	$0.15/per million tokens	$5/per million tokens
Baidu ERNIE API	$0.1/per million tokens	$10/per million tokens
Cerebras Inference API	$0.1/per million tokens	$6/per million tokens
Claude API	$0.03/per million tokens	$75/per million tokens

Detailed pricing comparisons:

Browse all LLM API Providers pricing →

2 Fireworks AI Hidden Costs Beyond the List Price

Beyond the listed price, Fireworks AI has at least 2 documented hidden costs that can significantly increase total cost of ownership.

Watch for 2 hidden costs

Markup Over Direct Provider APIs 100-300% of license costs
medium 2 sources

Reddit "I just checked the pricing, it's much more expensive then just using R1's API ($2.19/million tokens output directly from deepseek vs."
Reddit "fireworks API Seems even cheaper if you go directly from DeepSeek https://api-docs.deepseek.com/quick_start/pricing deepseek-reasoner 1M TOKENS OUTPUT PRICE $2.19"
Fine-Tuning Unavailable for Large MoE Models on Serverless 5-15% of license costs
medium 1 source

Reddit "I just checked the pricing, it's much more expensive then just using R1's API ($2.19/million tokens output directly from deepseek vs."

Tip

Ask your Fireworks AI sales rep about these costs upfront. Getting them in writing before signing can save you from surprise charges later.

Full hidden costs breakdown →

Intelligence sourced from 1 independent sources

Reddit User discussions

Key claims include inline source attribution. Data verified against multiple independent sources. 8 source citations total.

How to Negotiate Fireworks AI Pricing

Fireworks AI contracts are negotiable. These 4 tactics are sourced from real buyer experiences and procurement specialists.

Negotiation Playbook 4 tactics

Benchmark Against Direct Provider APIs Before Committing high success

For flagship models available directly from their creators (e.g., DeepSeek, Mistral, Meta), compare Fireworks AI Serverless rates against the direct provider API. Community reports from early 2025 showed Fireworks pricing 2–4x higher than direct for certain models. If your workload uses predominantly one model and volume is high, the cost delta may outweigh the convenience of Fireworks' unified API.

Reddit community (r/startups 2025-03-07, r/OpenAI 2025-01-28)

Move High-Volume Workloads to On-Demand GPU Tiers medium success

Fireworks AI's Serverless tier charges per token, which can be costly at scale. For predictable, sustained inference workloads, On-Demand dedicated GPU instances (A100, H100/H200, or B200) may offer lower effective per-token costs. Contact Fireworks AI sales with your monthly token estimates to get a GPU-hour comparison.

Current tier data

Negotiate Enterprise Tier for Volume Commitments medium success

Fireworks AI's Enterprise tier is custom-quoted. Teams with large, predictable monthly token volumes should negotiate annual volume commitments in exchange for rate discounts and dedicated SLAs. Engage Fireworks sales with 3–6 months of usage data to support the negotiation.

Current tier data

Select the Lowest-Cost GPU Tier That Meets Latency Requirements medium success

Fireworks AI offers three On-Demand GPU grades: A100, H100/H200, and B200. A100 instances are typically lowest cost. Unless your workload requires H100/H200 or B200 throughput, default to A100 to minimize GPU-hour spend and negotiate upgrades only when latency SLAs demand it.

Current tier data

Full negotiation guide →

Fireworks AI Pricing FAQ

01 How much does Fireworks AI cost?

Fireworks AI serverless pricing starts at $0.10 per million tokens for small models (<4B parameters) and goes up to $0.90/M for models over 16B. On-demand GPU deployments range from $2.90/hr (A100) to $9.00/hr (B200). New accounts get $1 in free credits.

02 Does Fireworks AI have a free tier?

Fireworks AI offers $1 in free credits for new accounts. After that, pricing is pay-as-you-go with no minimum commitment. Batch inference and cached input tokens each offer 50% discounts, reducing ongoing costs.

03 How does Fireworks AI fine-tuning work?

Fireworks AI supports fine-tuning with SFT and DPO methods. Pricing ranges from $0.50/M training tokens for models under 16B to $10–20/M tokens for models over 300B. Fine-tuned models can be deployed on Serverless or dedicated infrastructure.

04 Fireworks AI vs Together AI: which should I choose?

Both offer serverless inference starting at $0.10/M tokens. Fireworks AI provides $1 free credits upfront and offers A100 On-Demand at $2.90/hr, while Together AI's comparable H100 dedicated is $3.99/hr. Fireworks AI is generally slightly cheaper for dedicated GPU hosting and offers batch discounts of 50%.

05 What is Fireworks AI On-Demand pricing?

Fireworks AI On-Demand GPU deployments are priced at $2.90/hr for A100 80GB, $6.00/hr for H100/H200, and $9.00/hr for B200. These are dedicated single-tenant deployments ideal for hosting custom fine-tuned models or maintaining consistent inference capacity.

06 Is Fireworks AI cheaper than going directly to model providers like DeepSeek?

Not always. Community comparisons from early 2025 noted DeepSeek R1 costing $8/1M output tokens on Fireworks Serverless versus $2.19/1M output tokens directly from DeepSeek. However, Fireworks pricing evolves frequently — the Artificial Analysis April 2026 benchmark shows the provider median at $1.68/1M output tokens across 16 tracked models. For high-volume single-model workloads, always compare current rates against direct provider APIs before committing.

07 What GPU options are available on Fireworks AI's On-Demand tier?

Fireworks AI offers three On-Demand GPU tiers: A100, H100/H200, and B200. All are custom-priced based on your requirements. The Enterprise tier adds dedicated infrastructure, SLA guarantees, and additional support. Contact Fireworks AI sales for specific GPU-hour pricing at your expected usage level.

08 Can I fine-tune models on Fireworks AI?

Fine-tuning has limitations. Community users have noted that MoE (Mixture of Experts) models over 176B parameters cannot be fine-tuned on the Serverless tier. Teams requiring fine-tuning of large MoE models need to use On-Demand or Enterprise tiers, which carry custom pricing.

09 What is the median cost per million tokens on Fireworks AI?

According to Artificial Analysis data from April 2026, Fireworks AI's median blended rate across 16 tracked models is $0.84 per 1M tokens, with a median input rate of $0.53/1M and median output rate of $1.68/1M. Individual model prices range from $0.20/1M blended (Qwen3-8B) to $2.15/1M blended (GLM-5-1).

Is this pricing incorrect? — we'll verify and update it.