Quick Answer
Last verified:
High confidence

Together AI costs $0.03 to $9.95 per per million tokens / hour as of April 2026, with 5 plans available. Pricing depends on your chosen tier, contract length, and negotiated discounts.

Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.

  • Free tier: No free tier available

Together AI offers 5 pricing tiers: Serverless, Dedicated (1x H100), Dedicated (1x H200), Dedicated (1x B200), Enterprise. The Dedicated (1x H100) plan is consistent high-volume inference.

Compared to other llm api providers software, Together AI is positioned at the budget-friendly price point.

How much does Together AI cost?

Together AI pricing starts at $0.03/per million tokens / hour across 5 plans, with enterprise pricing available on request. Plans include Serverless (custom pricing), Dedicated (1x H100) (custom pricing), Dedicated (1x H200) (custom pricing), Dedicated (1x B200) (custom pricing), Enterprise (custom pricing).

Together AI Pricing Overview

Together AI has 5 pricing plans ranging from $0.03 to $9.95/per million tokens / hour. The Serverless plan requires contacting sales for a custom quote and is designed for variable-volume api usage. The Dedicated (1x H100) plan requires contacting sales for a custom quote and is designed for consistent high-volume inference. The Dedicated (1x H200) plan requires contacting sales for a custom quote and is designed for high-throughput dedicated inference. The Dedicated (1x B200) plan requires contacting sales for a custom quote and is designed for high-performance dedicated inference. The Enterprise plan requires contacting sales for a custom quote and is designed for large-scale enterprise deployments.

This pricing was last verified in April 15, 2026 from 2 independent sources.

Together AI provides serverless LLM inference and dedicated GPU hosting, with serverless pricing starting at $0.10 per million tokens for small models and scaling to $3.00+ per million tokens for large models. Dedicated GPU deployments are available starting at $3.99/hour for a 1x H100 and $9.95/hour for a 1x B200. Batch API processing offers 40–50% discounts over standard serverless rates.

All Together AI Plans & Pricing

Plan Monthly Annual Best For
Serverless Contact Sales Contact Sales Variable-volume API usage
Dedicated (1x H100) Contact Sales Contact Sales Consistent high-volume inference
Dedicated (1x H200) Contact Sales Contact Sales High-throughput dedicated inference
Dedicated (1x B200) Contact Sales Contact Sales High-performance dedicated inference
Enterprise Contact Sales Contact Sales Large-scale enterprise deployments
View all features by plan

Serverless

  • Pay-as-you-go per-token pricing
  • Budget models from $0.03/M tokens
  • Mid-range models from $0.50/M tokens
  • Large models from $1.00/M tokens
  • Batch API with 50% discount for most models
  • Cached input pricing for select models
  • Vision, image, audio, video, and transcription models available

Dedicated (1x H100)

  • Single-tenant GPU deployment
  • 1x H100 80GB at $3.99/hr
  • Custom model hosting
  • Autoscaling and traffic spike handling
  • Guaranteed performance

Dedicated (1x H200)

  • Single-tenant GPU deployment
  • 1x H200 141GB at $5.49/hr
  • Custom model hosting
  • Autoscaling and traffic spike handling
  • Guaranteed performance

Dedicated (1x B200)

  • Single-tenant GPU deployment
  • 1x B200 180GB at $9.95/hr
  • Latest generation hardware
  • Autoscaling and traffic spike handling
  • Guaranteed performance

Enterprise

  • Volume discounts
  • Dedicated support
  • Custom SLAs
  • Private deployments

Usage-Based Rates

Per-unit pricing for Together AI API usage.

Serverless

Model Unit Rate
LFM2 24B A2B 1M input tokens $0.03
LFM2 24B A2B 1M output tokens $0.12
gpt-oss-20B 1M input tokens $0.05
gpt-oss-20B 1M output tokens $0.2
Gemma 3n E4B Instruct 1M input tokens $0.06
Gemma 3n E4B Instruct 1M output tokens $0.12
Llama 3 8B Instruct Lite 1M input tokens $0.1
Llama 3 8B Instruct Lite 1M output tokens $0.1
Qwen3.5 9B 1M input tokens $0.1
Qwen3.5 9B 1M output tokens $0.15
gpt-oss-120B 1M input tokens $0.15
gpt-oss-120B 1M output tokens $0.6
Rnj-1 Instruct 1M input tokens $0.15
Rnj-1 Instruct 1M output tokens $0.15
Gemma 4 31B 1M input tokens $0.2
Gemma 4 31B 1M output tokens $0.5
Mistral (7B) Instruct v0.2 1M input tokens $0.2
Mistral (7B) Instruct v0.2 1M output tokens $0.2
MiniMax M2.5 1M input tokens $0.3
MiniMax M2.5 1M output tokens $1.2
MiniMax M2.5 1M cached input tokens $0.06
MiniMax M2.7 1M input tokens $0.3
MiniMax M2.7 1M output tokens $1.2
MiniMax M2.7 1M cached input tokens $0.06
Qwen2.5 7B Instruct Turbo 1M input tokens $0.3
Qwen2.5 7B Instruct Turbo 1M output tokens $0.3
Kimi K2.5 1M input tokens $0.5
Kimi K2.5 1M output tokens $2.8
Qwen3-Coder-Next 1M input tokens $0.5
Qwen3-Coder-Next 1M output tokens $1.2
DeepSeek-V3.1 1M input tokens $0.6
DeepSeek-V3.1 1M output tokens $1.7
Qwen3.5-397B-A17B 1M input tokens $0.6
Qwen3.5-397B-A17B 1M output tokens $3.6
Llama 3.3 70B 1M input tokens $0.88
Llama 3.3 70B 1M output tokens $0.88
Kimi K2 Instruct 1M input tokens $1
Kimi K2 Instruct 1M output tokens $3
GLM-5 1M input tokens $1
GLM-5 1M output tokens $3.2
Cogito v2.1 671B 1M input tokens $1.25
Cogito v2.1 671B 1M output tokens $1.25
GLM-5.1 1M input tokens $1.4
GLM-5.1 1M output tokens $4.4
Qwen3-Coder 480B A35B Instruct 1M input tokens $2
Qwen3-Coder 480B A35B Instruct 1M output tokens $2
DeepSeek-R1-0528 1M input tokens $3
DeepSeek-R1-0528 1M output tokens $7
  • Top models listed; many more available on platform
  • Cached input pricing available for select models
  • Batch inference available at ~50% discount for most models

Dedicated (1x H100)

Model Unit Rate
1x H100 80GB second $0.00111
  • $3.99/hour per H100 GPU

Dedicated (1x H200)

Model Unit Rate
1x H200 141GB second $0.001525
  • $5.49/hour per H200 GPU

Dedicated (1x B200)

Model Unit Rate
1x B200 180GB second $0.00276
  • $9.95/hour per B200 GPU

How Together AI Pricing Compares

Software Starting Price Top Price
Together AI $0.03/per million tokens / hour $9.95/per million tokens / hour
Groq Free $3/per million tokens
Fireworks AI Free $9/per million tokens / hour
Google Gemini API Free $18/per million tokens
Mistral AI API $0.1/per million tokens $6/per million tokens
Perplexity API $1/per million tokens + per-request fee $15/per million tokens + per-request fee

Detailed pricing comparisons:

Together AI Pricing FAQ

01 How much does Together AI cost?

Together AI offers serverless inference starting at $0.10 per million tokens for small models. Mid-range models cost $0.50–1.00/M tokens, and large models like DeepSeek-R1 cost $3.00/M tokens. Dedicated GPU deployments start at $3.99/hr (1x H100) or $9.95/hr (1x B200). Batch processing saves 40–50%.

02 Does Together AI have a free tier?

Together AI does not advertise a permanent free tier or free credits on their pricing page. They offer pay-as-you-go Serverless pricing with no minimum commitment, so you only pay for what you use.

03 What models does Together AI support?

Together AI supports a wide range of open-source models including Llama, DeepSeek, Qwen, Mistral, and Kimi. They also offer image generation (FLUX, Stable Diffusion), video (Google Veo 2.0), audio transcription, text-to-speech, and embedding models.

04 Together AI vs Fireworks AI: which is cheaper?

Both offer similar serverless per-token pricing starting around $0.10/M tokens for small models. Fireworks AI gives new users $1 in free credits. For dedicated GPU hosting, Together AI's H100 is $3.99/hr versus Fireworks AI's A100 at $2.90/hr, making Fireworks slightly cheaper for dedicated compute at equivalent GPU tiers.

05 What is Together AI's Dedicated GPU pricing?

Together AI's Dedicated GPU hosting starts at $3.99/hr for a 1x H100 (single-tenant) and $9.95/hr for a 1x B200 (latest generation). Dedicated deployments are best for consistent high-volume inference where you need guaranteed resources and custom model hosting.

Is this pricing incorrect? — we'll verify and update it.