Fireworks AI Pricing 2026: $0-$9/per million tokens / hour

Q: Does Fireworks AI have a free tier?

Fireworks AI offers $1 in free credits for new accounts. After that, pricing is pay-as-you-go with no minimum commitment. Batch inference and cached input tokens each offer 50% discounts, reducing ongoing costs.

Q: How does Fireworks AI fine-tuning work?

Fireworks AI supports fine-tuning with SFT and DPO methods. Pricing ranges from $0.50/M training tokens for models under 16B to $10–20/M tokens for models over 300B. Fine-tuned models can be deployed on Serverless or dedicated infrastructure.

Q: Fireworks AI vs Together AI: which should I choose?

Both offer serverless inference starting at $0.10/M tokens. Fireworks AI provides $1 free credits upfront and offers A100 On-Demand at $2.90/hr, while Together AI's comparable H100 dedicated is $3.99/hr. Fireworks AI is generally slightly cheaper for dedicated GPU hosting and offers batch discounts of 50%.

Q: What is Fireworks AI On-Demand pricing?

Fireworks AI On-Demand GPU deployments are priced at $2.90/hr for A100 80GB, $6.00/hr for H100/H200, and $9.00/hr for B200. These are dedicated single-tenant deployments ideal for hosting custom fine-tuned models or maintaining consistent inference capacity.

Price checkPer per million tokens

ServerlessCustom On-Demand (A100)Custom On-Demand (H100/H200)Custom

See all 5 plans

Quick Answer

Last verified: April 1, 2026

High confidence

Fireworks AI costs Free to $9 per per million tokens / hour as of April 2026, with 5 plans available. Pricing depends on your chosen tier, contract length, and negotiated discounts.

Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.

Free tier: No free tier available

Fireworks AI offers 5 pricing tiers: Serverless, On-Demand (A100), On-Demand (H100/H200), On-Demand (B200), Enterprise. The On-Demand (A100) plan is consistent inference workloads.

Compared to other llm api providers software, Fireworks AI is positioned at the budget-friendly price point.

How much does Fireworks AI cost?

Fireworks AI pricing starts at $0/per million tokens / hour across 5 plans, with enterprise pricing available on request. Plans include Serverless (custom pricing), On-Demand (A100) (custom pricing), On-Demand (H100/H200) (custom pricing), On-Demand (B200) (custom pricing), Enterprise (custom pricing).

Fireworks AI Pricing Overview

Fireworks AI has 5 pricing plans ranging from $0 to $9/per million tokens / hour. The Serverless plan requires contacting sales for a custom quote and is designed for variable-volume api usage. The On-Demand (A100) plan requires contacting sales for a custom quote and is designed for consistent inference workloads. The On-Demand (H100/H200) plan requires contacting sales for a custom quote and is designed for large model hosting. The On-Demand (B200) plan requires contacting sales for a custom quote and is designed for cutting-edge performance. The Enterprise plan requires contacting sales for a custom quote and is designed for large-scale enterprise deployments.

This pricing was last verified in April 1, 2026.

See Fireworks AI Plans

Fireworks AI offers pay-as-you-go serverless LLM inference and dedicated GPU hosting, with serverless pricing starting at $0.10 per million tokens for models under 4B parameters. New accounts receive $1 in free credits to start. On-demand GPU deployments range from $2.90/hour for an A100 80GB up to $9.00/hour for a B200, with batch inference and cached input tokens both available at 50% discounts.

All Fireworks AI Plans & Pricing

Plan	Monthly	Annual	Best For
Serverless	Contact Sales	Contact Sales	Variable-volume API usage
On-Demand (A100)	Contact Sales	Contact Sales	Consistent inference workloads
On-Demand (H100/H200)	Contact Sales	Contact Sales	Large model hosting
On-Demand (B200)	Contact Sales	Contact Sales	Cutting-edge performance
Enterprise	Contact Sales	Contact Sales	Large-scale enterprise deployments

View all features by plan

Serverless

$1 free credits to start
Models <4B at $0.10/M tokens
Models 4B-16B at $0.20/M tokens
Models >16B at $0.90/M tokens
MoE models at $0.50-$1.20/M tokens
Cached input tokens at 50% price
Batch inference at 50% discount
Embeddings from $0.008/M tokens

On-Demand (A100)

A100 80GB at $2.90/hr
Dedicated model hosting
Custom fine-tuned models

On-Demand (H100/H200)

H100/H200 at $6.00/hr
High-performance inference
Dedicated resources

On-Demand (B200)

B200 at $9.00/hr
Latest generation hardware
Maximum throughput

Enterprise

Volume discounts
Dedicated support
Custom SLAs

Get Started with Fireworks AI

How Fireworks AI Pricing Compares

Software	Starting Price	Top Price
Fireworks AI	Free	$9/per million tokens / hour
Groq	Free	$0.79/per million tokens
Together AI	$0.1/per million tokens / hour	$9.95/per million tokens / hour

Detailed pricing comparisons:

Fireworks AI vs Groq

Browse all LLM API Providers pricing →

Fireworks AI Pricing FAQ

01 How much does Fireworks AI cost?

Fireworks AI serverless pricing starts at $0.10 per million tokens for small models (<4B parameters) and goes up to $0.90/M for models over 16B. On-demand GPU deployments range from $2.90/hr (A100) to $9.00/hr (B200). New accounts get $1 in free credits.

02 Does Fireworks AI have a free tier?

Fireworks AI offers $1 in free credits for new accounts. After that, pricing is pay-as-you-go with no minimum commitment. Batch inference and cached input tokens each offer 50% discounts, reducing ongoing costs.

03 How does Fireworks AI fine-tuning work?

Fireworks AI supports fine-tuning with SFT and DPO methods. Pricing ranges from $0.50/M training tokens for models under 16B to $10–20/M tokens for models over 300B. Fine-tuned models can be deployed on Serverless or dedicated infrastructure.

04 Fireworks AI vs Together AI: which should I choose?

Both offer serverless inference starting at $0.10/M tokens. Fireworks AI provides $1 free credits upfront and offers A100 On-Demand at $2.90/hr, while Together AI's comparable H100 dedicated is $3.99/hr. Fireworks AI is generally slightly cheaper for dedicated GPU hosting and offers batch discounts of 50%.

05 What is Fireworks AI On-Demand pricing?

Fireworks AI On-Demand GPU deployments are priced at $2.90/hr for A100 80GB, $6.00/hr for H100/H200, and $9.00/hr for B200. These are dedicated single-tenant deployments ideal for hosting custom fine-tuned models or maintaining consistent inference capacity.

Is this pricing incorrect? — we verify and update within 24 hours.