Lepton AI Pricing 2026
Complete pricing guide with plans, hidden costs, and cost analysis
Lepton AI pricing ranges from $0.07 to $4/per million tokens.
Lepton AI costs $0.07 to $4 per per million tokens as of April 2026, with 2 plans available. Pricing depends on your chosen tier, contract length, and negotiated discounts.
Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.
- Free tier: No free tier available
Lepton AI offers 2 pricing tiers: Serverless Inference, GPU Cloud. The GPU Cloud plan is teams deploying custom models with full control over gpu configuration.
Compared to other llm api providers software, Lepton AI is positioned at the budget-friendly price point.
- 2 documented hidden costs beyond list price
How much does Lepton AI cost?
Lepton AI Pricing Overview
Lepton AI has 2 pricing plans ranging from $0.07 to $4/per million tokens. The Serverless Inference plan requires contacting sales for a custom quote and is designed for developers needing fast serverless inference for open-source models. The GPU Cloud plan requires contacting sales for a custom quote and is designed for teams deploying custom models with full control over gpu configuration.
There are at least 2 documented hidden costs beyond Lepton AI's list price, including implementation, training, and add-on fees.
This pricing was last verified in April 15, 2026 from 1 independent sources.
Lepton AI is a cloud platform for AI workloads offering two main services: serverless LLM inference endpoints for popular open-source models, and GPU cloud instances for custom deployments. The inference API is OpenAI-compatible and supports Llama 3.x, Mistral, and other models with per-token pricing from $0.07/M tokens. GPU instances start at $0.39/hr. Lepton AI is designed for ML engineers who want a single platform for both quick API inference and custom model deployments.
How Lepton AI Pricing Compares
Compare Lepton AI pricing against top alternatives in LLM API Providers.
All Lepton AI Plans & Pricing
| Plan | Monthly | Annual | Best For |
|---|---|---|---|
| Serverless Inference | Custom | Custom | Developers needing fast serverless inference for open-source models |
| GPU Cloud | Custom | Custom | Teams deploying custom models with full control over GPU configuration |
View all features by plan
Serverless Inference
- OpenAI-compatible REST API
- Llama 3.x (8B to 70B)
- Mistral models
- DeepSeek models
- No GPU management required
- Pay per token
GPU Cloud
- A10G: from $0.75/hr
- A100 80GB: from $2.00/hr
- H100 80GB: from $4.00/hr
- Kubernetes-native deployment
- Custom model serving with BentoML/vLLM
- Autoscaling to zero
Usage-Based Rates
Per-unit pricing for Lepton AI API usage.
Serverless Inference
| Model | Unit | Rate |
|---|---|---|
| Llama 3.1 8B Instruct | 1M input tokens | $0.070 |
| Llama 3.1 8B Instruct | 1M output tokens | $0.070 |
| Llama 3.3 70B Instruct | 1M input tokens | $0.600 |
| Llama 3.3 70B Instruct | 1M output tokens | $0.600 |
| Mistral 7B Instruct | 1M input tokens | $0.070 |
| Mistral 7B Instruct | 1M output tokens | $0.070 |
| DeepSeek-R1 (671B) | 1M input tokens | $3.00 Full distilled model |
| DeepSeek-R1 (671B) | 1M output tokens | $7.00 Full distilled model |
| Llama 3.1 70B Instruct | 1M input tokens | $0.600 |
| Llama 3.1 70B Instruct | 1M output tokens | $0.600 |
- Rates approximate — verify at lepton.ai/pricing
- OpenAI-compatible endpoint: api.lepton.ai/api/v1
GPU Cloud
| Model | Unit | Rate |
|---|---|---|
| A10G GPU (24GB) | second | $0.00021 ~$0.75/hr |
| A100 80GB | second | $0.00056 ~$2.00/hr |
| H100 80GB | second | $0.00111 ~$4.00/hr |
- GPU instance pricing approximate — verify at lepton.ai/pricing
- Price expressed per second; multiply by 3600 for hourly equivalent
Compare Lepton AI vs Alternatives
Before committing to Lepton AI, compare pricing with these 3 alternatives in the same category.
How Lepton AI Pricing Compares
| Software | Starting Price | Top Price |
|---|---|---|
| Lepton AI | $0.07/per million tokens | $4/per million tokens |
| Amazon Bedrock | $0.07/per million tokens | $75/per million tokens |
| Anyscale | $0.15/per million tokens | $5/per million tokens |
| Baidu ERNIE API | $0.1/per million tokens | $10/per million tokens |
| Cerebras Inference API | $0.1/per million tokens | $6/per million tokens |
| Claude API | $0.03/per million tokens | $75/per million tokens |
Detailed pricing comparisons:
Lepton AI Pricing FAQ
01 How much does Lepton AI cost?
Lepton AI serverless inference starts at $0.07 per million tokens for small models like Llama 3.1 8B. GPU cloud instances start at approximately $0.75/hr for A10G. Pricing is pay-as-you-go with no minimum commitments.
02 What models does Lepton AI support?
Lepton AI supports Llama 3.x (8B to 70B), Mistral models, DeepSeek R1, and other popular open-source models through its serverless inference API. Custom model deployment is available on GPU cloud instances.
03 Is Lepton AI OpenAI-compatible?
Yes, Lepton AI's inference API is OpenAI-compatible. Point your OpenAI SDK to api.lepton.ai/api/v1 to use it as a drop-in replacement for compatible models.
04 Does Lepton AI have a free tier?
Lepton AI offers a free tier with limited credits for new accounts. Check lepton.ai for current free tier details and credit amounts.
Is this pricing incorrect? — we'll verify and update it.