Together AI Pricing 2026
Complete pricing guide with plans, and cost analysis
Together AI pricing ranges from $0.03 to $9.95/per million tokens / hour.
Together AI costs $0.03 to $9.95 per per million tokens / hour as of April 2026, with 5 plans available. Pricing depends on your chosen tier, contract length, and negotiated discounts.
Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.
- Free tier: No free tier available
Together AI offers 5 pricing tiers: Serverless, Dedicated (1x H100), Dedicated (1x H200), Dedicated (1x B200), Enterprise. The Dedicated (1x H100) plan is consistent high-volume inference.
Compared to other llm api providers software, Together AI is positioned at the budget-friendly price point.
How much does Together AI cost?
Together AI Pricing Overview
Together AI has 5 pricing plans ranging from $0.03 to $9.95/per million tokens / hour. The Serverless plan requires contacting sales for a custom quote and is designed for variable-volume api usage. The Dedicated (1x H100) plan requires contacting sales for a custom quote and is designed for consistent high-volume inference. The Dedicated (1x H200) plan requires contacting sales for a custom quote and is designed for high-throughput dedicated inference. The Dedicated (1x B200) plan requires contacting sales for a custom quote and is designed for high-performance dedicated inference. The Enterprise plan requires contacting sales for a custom quote and is designed for large-scale enterprise deployments.
This pricing was last verified in April 15, 2026 from 2 independent sources.
All Together AI Plans & Pricing
| Plan | Monthly | Annual | Best For |
|---|---|---|---|
| Serverless | Contact Sales | Contact Sales | Variable-volume API usage |
| Dedicated (1x H100) | Contact Sales | Contact Sales | Consistent high-volume inference |
| Dedicated (1x H200) | Contact Sales | Contact Sales | High-throughput dedicated inference |
| Dedicated (1x B200) | Contact Sales | Contact Sales | High-performance dedicated inference |
| Enterprise | Contact Sales | Contact Sales | Large-scale enterprise deployments |
View all features by plan
Serverless
- Pay-as-you-go per-token pricing
- Budget models from $0.03/M tokens
- Mid-range models from $0.50/M tokens
- Large models from $1.00/M tokens
- Batch API with 50% discount for most models
- Cached input pricing for select models
- Vision, image, audio, video, and transcription models available
Dedicated (1x H100)
- Single-tenant GPU deployment
- 1x H100 80GB at $3.99/hr
- Custom model hosting
- Autoscaling and traffic spike handling
- Guaranteed performance
Dedicated (1x H200)
- Single-tenant GPU deployment
- 1x H200 141GB at $5.49/hr
- Custom model hosting
- Autoscaling and traffic spike handling
- Guaranteed performance
Dedicated (1x B200)
- Single-tenant GPU deployment
- 1x B200 180GB at $9.95/hr
- Latest generation hardware
- Autoscaling and traffic spike handling
- Guaranteed performance
Enterprise
- Volume discounts
- Dedicated support
- Custom SLAs
- Private deployments
Usage-Based Rates
Per-unit pricing for Together AI API usage.
Serverless
| Model | Unit | Rate |
|---|---|---|
| LFM2 24B A2B | 1M input tokens | $0.03 |
| LFM2 24B A2B | 1M output tokens | $0.12 |
| gpt-oss-20B | 1M input tokens | $0.05 |
| gpt-oss-20B | 1M output tokens | $0.2 |
| Gemma 3n E4B Instruct | 1M input tokens | $0.06 |
| Gemma 3n E4B Instruct | 1M output tokens | $0.12 |
| Llama 3 8B Instruct Lite | 1M input tokens | $0.1 |
| Llama 3 8B Instruct Lite | 1M output tokens | $0.1 |
| Qwen3.5 9B | 1M input tokens | $0.1 |
| Qwen3.5 9B | 1M output tokens | $0.15 |
| gpt-oss-120B | 1M input tokens | $0.15 |
| gpt-oss-120B | 1M output tokens | $0.6 |
| Rnj-1 Instruct | 1M input tokens | $0.15 |
| Rnj-1 Instruct | 1M output tokens | $0.15 |
| Gemma 4 31B | 1M input tokens | $0.2 |
| Gemma 4 31B | 1M output tokens | $0.5 |
| Mistral (7B) Instruct v0.2 | 1M input tokens | $0.2 |
| Mistral (7B) Instruct v0.2 | 1M output tokens | $0.2 |
| MiniMax M2.5 | 1M input tokens | $0.3 |
| MiniMax M2.5 | 1M output tokens | $1.2 |
| MiniMax M2.5 | 1M cached input tokens | $0.06 |
| MiniMax M2.7 | 1M input tokens | $0.3 |
| MiniMax M2.7 | 1M output tokens | $1.2 |
| MiniMax M2.7 | 1M cached input tokens | $0.06 |
| Qwen2.5 7B Instruct Turbo | 1M input tokens | $0.3 |
| Qwen2.5 7B Instruct Turbo | 1M output tokens | $0.3 |
| Kimi K2.5 | 1M input tokens | $0.5 |
| Kimi K2.5 | 1M output tokens | $2.8 |
| Qwen3-Coder-Next | 1M input tokens | $0.5 |
| Qwen3-Coder-Next | 1M output tokens | $1.2 |
| DeepSeek-V3.1 | 1M input tokens | $0.6 |
| DeepSeek-V3.1 | 1M output tokens | $1.7 |
| Qwen3.5-397B-A17B | 1M input tokens | $0.6 |
| Qwen3.5-397B-A17B | 1M output tokens | $3.6 |
| Llama 3.3 70B | 1M input tokens | $0.88 |
| Llama 3.3 70B | 1M output tokens | $0.88 |
| Kimi K2 Instruct | 1M input tokens | $1 |
| Kimi K2 Instruct | 1M output tokens | $3 |
| GLM-5 | 1M input tokens | $1 |
| GLM-5 | 1M output tokens | $3.2 |
| Cogito v2.1 671B | 1M input tokens | $1.25 |
| Cogito v2.1 671B | 1M output tokens | $1.25 |
| GLM-5.1 | 1M input tokens | $1.4 |
| GLM-5.1 | 1M output tokens | $4.4 |
| Qwen3-Coder 480B A35B Instruct | 1M input tokens | $2 |
| Qwen3-Coder 480B A35B Instruct | 1M output tokens | $2 |
| DeepSeek-R1-0528 | 1M input tokens | $3 |
| DeepSeek-R1-0528 | 1M output tokens | $7 |
- Top models listed; many more available on platform
- Cached input pricing available for select models
- Batch inference available at ~50% discount for most models
Dedicated (1x H100)
| Model | Unit | Rate |
|---|---|---|
| 1x H100 80GB | second | $0.00111 |
- $3.99/hour per H100 GPU
Dedicated (1x H200)
| Model | Unit | Rate |
|---|---|---|
| 1x H200 141GB | second | $0.001525 |
- $5.49/hour per H200 GPU
Dedicated (1x B200)
| Model | Unit | Rate |
|---|---|---|
| 1x B200 180GB | second | $0.00276 |
- $9.95/hour per B200 GPU
How Together AI Pricing Compares
| Software | Starting Price | Top Price |
|---|---|---|
| Together AI | $0.03/per million tokens / hour | $9.95/per million tokens / hour |
| Groq | Free | $3/per million tokens |
| Fireworks AI | Free | $9/per million tokens / hour |
| Google Gemini API | Free | $18/per million tokens |
| Mistral AI API | $0.1/per million tokens | $6/per million tokens |
| Perplexity API | $1/per million tokens + per-request fee | $15/per million tokens + per-request fee |
Detailed pricing comparisons:
Together AI Pricing FAQ
01 How much does Together AI cost?
Together AI offers serverless inference starting at $0.10 per million tokens for small models. Mid-range models cost $0.50–1.00/M tokens, and large models like DeepSeek-R1 cost $3.00/M tokens. Dedicated GPU deployments start at $3.99/hr (1x H100) or $9.95/hr (1x B200). Batch processing saves 40–50%.
02 Does Together AI have a free tier?
Together AI does not advertise a permanent free tier or free credits on their pricing page. They offer pay-as-you-go Serverless pricing with no minimum commitment, so you only pay for what you use.
03 What models does Together AI support?
Together AI supports a wide range of open-source models including Llama, DeepSeek, Qwen, Mistral, and Kimi. They also offer image generation (FLUX, Stable Diffusion), video (Google Veo 2.0), audio transcription, text-to-speech, and embedding models.
04 Together AI vs Fireworks AI: which is cheaper?
Both offer similar serverless per-token pricing starting around $0.10/M tokens for small models. Fireworks AI gives new users $1 in free credits. For dedicated GPU hosting, Together AI's H100 is $3.99/hr versus Fireworks AI's A100 at $2.90/hr, making Fireworks slightly cheaper for dedicated compute at equivalent GPU tiers.
05 What is Together AI's Dedicated GPU pricing?
Together AI's Dedicated GPU hosting starts at $3.99/hr for a 1x H100 (single-tenant) and $9.95/hr for a 1x B200 (latest generation). Dedicated deployments are best for consistent high-volume inference where you need guaranteed resources and custom model hosting.
Is this pricing incorrect? — we'll verify and update it.