LLM API Pricing Benchmarks (2026): GPT vs Claude vs Gemini vs Grok

Quick Answer

Across 14 major LLM API providers the market median is $0.545/1M input tokens and $1.94/1M output tokens. The cheapest provider is Groq at $0.13/1M input; the most premium is Claude API at $25.0/1M output. Data aggregates Artificial Analysis (provider model catalogs), OpenRouter (router pricing), and community cost mentions.

Last updated: April 24, 2026

Market Median (14 LLM API providers)

As of April 2026 the median LLM API provider charges $0.545/1M input tokens and $1.94/1M output tokens, measured across the median model in each provider's catalog. The median is not a good proxy for any single model — flagship models (GPT-5.4 Pro, Claude Opus 4.6) sit 4-20× above the market median, while fast/mini variants (GPT-5.4 nano, Claude Haiku 4.5, Gemini Flash-Lite) sit 3-10× below.

Per-Provider Median Pricing

Provider	Median Input /1M	Median Output /1M	Models Tracked	Source
Groq	$0.13	$0.465	8	AA
Mistral AI API	$0.2	$0.6	23	AA
DeepInfra	$0.2	$0.54	93	AA
DeepSeek	$0.29	$0.79	11	OR
OpenRouter	$0.4	$1.25	320	OR
Together AI	$0.5	$1.2	23	AA
Fireworks AI	$0.53	$1.68	16	AA
Google Gemini API	$0.56	$2.2	51	AA
Moonshot Kimi API	$0.57	$2.3	5	OR
xAI Grok API	$1.15	$3.75	10	OR
OpenAI API	$1.25	$9.0	50	AA
Perplexity API	$2.0	$8.0	5	OR
Cohere API	$2.5	$10.0	1	AA
Claude API	$5.0	$25.0	16	AA

How to read this table

Each row shows the median input and output price across the provider's full model catalog. Providers with larger catalogs (DeepInfra 93 models, OpenRouter 320 models) aggregate across many low-cost small models, so their medians skew lower. Providers with small catalogs (Claude 16 models, Groq 8 models) show a true flagship-weighted median. For apples-to-apples model-level comparisons, use the side-by-side pages under /compare/.

Source attribution: AA = Artificial Analysis (authoritative model catalog). OR = OpenRouter API (router-surface pricing — may include modest markup). Community = user-reported costs from Reddit/HN discussions.

Price Compression 2024-2026

The per-1M input median has dropped ~60% since 2024 when GPT-4 Turbo launched at $10/1M input. Open-weight providers (DeepInfra, Together AI, Fireworks AI) serving Llama, Mixtral, and Qwen models have pulled the market median below $1/1M input. Frontier-model providers (OpenAI, Anthropic, xAI) continue to command premium pricing for their flagship reasoning models, but have introduced mini/nano variants to compete at the low end.

Frequently Asked Questions

01 Which LLM API is cheapest in 2026?

Groq has the lowest median input price at $0.13/1M tokens, across 8 tracked models. Mistral AI API is a close second at $0.2/1M input. Both optimize for inference throughput on open-weight models rather than frontier proprietary models.

02 What's the median LLM API price in 2026?

Across 14 major LLM API providers, the market median is $0.545/1M input tokens and $1.94/1M output tokens. Output typically costs 2-4× input. Cached input (when supported) runs 10-25% of cache-miss input price.

03 Is OpenAI more expensive than the market median?

Yes. OpenAI's median input is $1.25/1M and output is $9/1M — roughly 2-4× the market median. OpenAI has both mid-range models (GPT-5.4 at $2.50 input) and flagship reasoning models (o3-pro at $20 input, GPT-5.4 Pro at $30 input), which pulls the average up.

04 How does OpenRouter pricing compare to direct provider pricing?

OpenRouter adds a small markup (typically 0-5%) over direct provider pricing, but aggregates 320+ models behind a single API key. For most cases the markup is worth the simplified billing and automatic failover. OpenRouter is the best option for benchmarking multiple models without signing up for each provider separately.

05 What is the cheapest LLM API per million tokens for batch workloads in 2026?

Gemini 2.5 Flash is the cheapest capable API for batch workloads at $0.075/1M input tokens, with output at $0.30/1M. For open-source models, Groq offers Llama and Gemma models starting at $0.05/1M input. Cerebras runs Llama 3.3 70B at $0.10/1M input with faster throughput than GPU clusters. For asynchronous batch jobs where latency does not matter, these three providers consistently undercut OpenAI and Anthropic by 10–25×.

06 Why are open-source LLM APIs 5–10× cheaper than GPT-5?

Open-source models like Llama 3.3 70B eliminate licensing costs, allowing inference providers to compete purely on hardware margin. GPT-5 at $1.25/1M input includes R&D amortization, RLHF infrastructure, and safety evaluation costs that OpenAI passes to customers. On Together AI, Llama 3.3 70B costs ~$0.88/1M — roughly 1.4× cheaper than GPT-5 input alone, without any output-token savings. The gap widens for output tokens: GPT-5 charges $10/1M output versus sub-$1 for most open-weight providers.

07 How much does it cost to run an LLM-powered chatbot for 1,000 daily active users?

A typical chatbot exchange is 500 input tokens and 300 output tokens per turn, with 3 turns per session — roughly 2,400 tokens total per user per day. At 1,000 DAU, that is 2.4 billion tokens per month. Using GPT-5 mini ($0.20/1M input, $1.20/1M output), this costs approximately $650–$900/month. Switching to Gemini 2.5 Flash ($0.075–$0.30/1M) reduces cost to $150–$350/month. For budget-first deployments, Groq at $0.05–$0.10/1M input brings monthly spend under $100 for the same workload.

08 How does prompt caching reduce LLM API costs?

Prompt caching stores the KV-cache state of a repeated prefix (system prompt, retrieved documents, few-shot examples) so subsequent calls skip recomputing those tokens. Anthropic charges $0.30/1M for Claude Sonnet 4.6 cache reads versus $3/1M for cache-miss input — a 10× reduction. OpenAI applies automatic caching on prompts over 1,024 tokens at a 50% discount. For RAG applications where 80%+ of the prompt is static context, caching typically cuts total input costs by 60–80%, making Claude and GPT-5 competitive with cheaper uncached alternatives.

09 Are LLM API prices dropping faster than Moore's Law?

Yes. Between 2023 and 2026, flagship LLM API prices fell roughly 10–20× — faster than Moore's Law's historical 2× per 18–24 months. GPT-4 launched at $30/1M input in 2023; comparable capability in 2026 (GPT-5 mini) costs $0.20/1M. Inference hardware improvements (H100→H200→B200 Blackwell) and model efficiency gains (distillation, speculative decoding) both compound. Analysts tracking the cost-per-MMLU-point benchmark see consistent 60–70% annual price declines for equivalent accuracy tiers, outpacing semiconductor roadmaps.

10 Do LLM API providers offer volume discounts?

Most major providers offer committed-use discounts at $10,000+/month spend. Anthropic's enterprise agreements typically provide 20–40% off list pricing for annual commits. OpenAI offers usage-tiered pricing above certain monthly thresholds, and enterprise contracts unlock further negotiated rates. Google Cloud's Gemini 2.5 Pro ($1.25/1M input, $5/1M output) is available at reduced rates through GCP committed-use contracts. Below $5,000/month, negotiated discounts are rare — providers at that tier compete on list price, and switching to Groq ($0.05–$0.79/1M) or Gemini Flash is typically more effective than negotiating.

11 Why is Claude Sonnet 4.6 more expensive than GPT-5?

Claude Sonnet 4.6 costs $3/1M input and $15/1M output — 2.4× more expensive on output than GPT-5 ($1.25 input, $10 output). Anthropic positions Claude for long-context agentic tasks where the model handles multi-step reasoning with large tool outputs, justifying higher per-token pricing. Sonnet 4.6 also has a 200K-token context window versus GPT-5's 128K, and Anthropic's Constitutional AI training pipeline adds cost that is passed through to API pricing. For short, single-turn completions, GPT-5 mini at $0.20/$1.20 is 10–12× cheaper than Claude Sonnet 4.6 for equivalent quality on most benchmarks.

12 How much should a startup budget for LLM API costs in 2026?

A pre-revenue startup in prototype phase typically spends $50–$300/month using a mix of GPT-5 mini ($0.20/1M input) and Gemini 2.5 Flash ($0.075/1M input). At early-revenue stage (1,000–10,000 users), budget $500–$3,000/month depending on task complexity and context length. Production-scale SaaS products processing millions of requests should plan for $5,000–$50,000/month unless they route intelligently — using cheap models (Groq at $0.05/1M) for classification and routing, reserving Claude Sonnet 4.6 ($3/1M) or GPT-5 ($1.25/1M) for tasks that require frontier quality. Model routing typically reduces blended cost by 60–80% versus using a single provider.