Quick Answer
Last verified:
High confidence

Fireworks AI costs Free to $9 per per million tokens / hour as of April 2026, with 5 plans available. Pricing depends on your chosen tier, contract length, and negotiated discounts.

Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.

  • Free tier: No free tier available

Fireworks AI offers 5 pricing tiers: Serverless, On-Demand (A100), On-Demand (H100/H200), On-Demand (B200), Enterprise. The On-Demand (A100) plan is consistent inference workloads.

Compared to other llm api providers software, Fireworks AI is positioned at the budget-friendly price point.

How much does Fireworks AI cost?

Fireworks AI pricing starts at $0/per million tokens / hour across 5 plans, with enterprise pricing available on request. Plans include Serverless (custom pricing), On-Demand (A100) (custom pricing), On-Demand (H100/H200) (custom pricing), On-Demand (B200) (custom pricing), Enterprise (custom pricing).

Fireworks AI Pricing Overview

Fireworks AI has 5 pricing plans ranging from $0 to $9/per million tokens / hour. The Serverless plan requires contacting sales for a custom quote and is designed for variable-volume api usage. The On-Demand (A100) plan requires contacting sales for a custom quote and is designed for consistent inference workloads. The On-Demand (H100/H200) plan requires contacting sales for a custom quote and is designed for large model hosting. The On-Demand (B200) plan requires contacting sales for a custom quote and is designed for cutting-edge performance. The Enterprise plan requires contacting sales for a custom quote and is designed for large-scale enterprise deployments.

This pricing was last verified in April 1, 2026.

Fireworks AI offers pay-as-you-go serverless LLM inference and dedicated GPU hosting, with serverless pricing starting at $0.10 per million tokens for models under 4B parameters. New accounts receive $1 in free credits to start. On-demand GPU deployments range from $2.90/hour for an A100 80GB up to $9.00/hour for a B200, with batch inference and cached input tokens both available at 50% discounts.

All Fireworks AI Plans & Pricing

Plan Monthly Annual Best For
Serverless Contact Sales Contact Sales Variable-volume API usage
On-Demand (A100) Contact Sales Contact Sales Consistent inference workloads
On-Demand (H100/H200) Contact Sales Contact Sales Large model hosting
On-Demand (B200) Contact Sales Contact Sales Cutting-edge performance
Enterprise Contact Sales Contact Sales Large-scale enterprise deployments
View all features by plan

Serverless

  • $1 free credits to start
  • Models <4B at $0.10/M tokens
  • Models 4B-16B at $0.20/M tokens
  • Models >16B at $0.90/M tokens
  • MoE models at $0.50-$1.20/M tokens
  • Cached input tokens at 50% price
  • Batch inference at 50% discount
  • Embeddings from $0.008/M tokens

On-Demand (A100)

  • A100 80GB at $2.90/hr
  • Dedicated model hosting
  • Custom fine-tuned models

On-Demand (H100/H200)

  • H100/H200 at $6.00/hr
  • High-performance inference
  • Dedicated resources

On-Demand (B200)

  • B200 at $9.00/hr
  • Latest generation hardware
  • Maximum throughput

Enterprise

  • Volume discounts
  • Dedicated support
  • Custom SLAs

How Fireworks AI Pricing Compares

Software Starting Price Top Price
Fireworks AI Free $9/per million tokens / hour
Groq Free $0.79/per million tokens
Together AI $0.1/per million tokens / hour $9.95/per million tokens / hour

Detailed pricing comparisons:

Fireworks AI Pricing FAQ

01 How much does Fireworks AI cost?

Fireworks AI serverless pricing starts at $0.10 per million tokens for small models (<4B parameters) and goes up to $0.90/M for models over 16B. On-demand GPU deployments range from $2.90/hr (A100) to $9.00/hr (B200). New accounts get $1 in free credits.

02 Does Fireworks AI have a free tier?

Fireworks AI offers $1 in free credits for new accounts. After that, pricing is pay-as-you-go with no minimum commitment. Batch inference and cached input tokens each offer 50% discounts, reducing ongoing costs.

03 How does Fireworks AI fine-tuning work?

Fireworks AI supports fine-tuning with SFT and DPO methods. Pricing ranges from $0.50/M training tokens for models under 16B to $10–20/M tokens for models over 300B. Fine-tuned models can be deployed on Serverless or dedicated infrastructure.

04 Fireworks AI vs Together AI: which should I choose?

Both offer serverless inference starting at $0.10/M tokens. Fireworks AI provides $1 free credits upfront and offers A100 On-Demand at $2.90/hr, while Together AI's comparable H100 dedicated is $3.99/hr. Fireworks AI is generally slightly cheaper for dedicated GPU hosting and offers batch discounts of 50%.

05 What is Fireworks AI On-Demand pricing?

Fireworks AI On-Demand GPU deployments are priced at $2.90/hr for A100 80GB, $6.00/hr for H100/H200, and $9.00/hr for B200. These are dedicated single-tenant deployments ideal for hosting custom fine-tuned models or maintaining consistent inference capacity.

Is this pricing incorrect? — we verify and update within 24 hours.