Lepton AI Pricing 2026: Serverless Inference & GPU Cloud Costs

Price checkPer per million tokens

Serverless InferenceCustom GPU CloudCustom

Quick Answer

Last verified: April 15, 2026

Medium confidence

Lepton AI costs $0.07 to $4 per per million tokens as of April 2026, with 2 plans available. Pricing depends on your chosen tier, contract length, and negotiated discounts.

Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.

Free tier: No free tier available

Lepton AI offers 2 pricing tiers: Serverless Inference, GPU Cloud. The GPU Cloud plan is teams deploying custom models with full control over gpu configuration.

Compared to other llm api providers software, Lepton AI is positioned at the budget-friendly price point.

2 documented hidden costs beyond list price

How much does Lepton AI cost?

Lepton AI pricing ranges from $0.07 to $4/per million tokens across 2 plans. Plans include Serverless Inference (custom pricing), GPU Cloud (custom pricing).

Lepton AI Pricing Overview

Lepton AI has 2 pricing plans ranging from $0.07 to $4/per million tokens. The Serverless Inference plan requires contacting sales for a custom quote and is designed for developers needing fast serverless inference for open-source models. The GPU Cloud plan requires contacting sales for a custom quote and is designed for teams deploying custom models with full control over gpu configuration.

There are at least 2 documented hidden costs beyond Lepton AI's list price, including implementation, training, and add-on fees.

This pricing was last verified in April 15, 2026 from 1 independent sources.

See Lepton AI Plans

Lepton AI is a cloud platform for AI workloads offering two main services: serverless LLM inference endpoints for popular open-source models, and GPU cloud instances for custom deployments. The inference API is OpenAI-compatible and supports Llama 3.x, Mistral, and other models with per-token pricing from $0.07/M tokens. GPU instances start at $0.39/hr. Lepton AI is designed for ML engineers who want a single platform for both quick API inference and custom model deployments.

How Lepton AI Pricing Compares

Compare Lepton AI pricing against top alternatives in LLM API Providers.

Cloudflare Workers AI $0.05-$5/per million tokens Compare → Mistral AI API $0.1-$6.0/per million tokens Compare → Cohere API $0.037-$10.0/per million tokens Compare →

All Lepton AI Plans & Pricing

Plan	Monthly	Annual	Best For
Serverless Inference	Custom	Custom	Developers needing fast serverless inference for open-source models
GPU Cloud	Custom	Custom	Teams deploying custom models with full control over GPU configuration

View all features by plan

Serverless Inference

OpenAI-compatible REST API
Llama 3.x (8B to 70B)
Mistral models
DeepSeek models
No GPU management required
Pay per token

GPU Cloud

A10G: from $0.75/hr
A100 80GB: from $2.00/hr
H100 80GB: from $4.00/hr
Kubernetes-native deployment
Custom model serving with BentoML/vLLM
Autoscaling to zero

See Lepton AI Plans

Usage-Based Rates

Per-unit pricing for Lepton AI API usage.

Serverless Inference

Model	Unit	Rate
Llama 3.1 8B Instruct	1M input tokens	$0.070
Llama 3.1 8B Instruct	1M output tokens	$0.070
Llama 3.3 70B Instruct	1M input tokens	$0.600
Llama 3.3 70B Instruct	1M output tokens	$0.600
Mistral 7B Instruct	1M input tokens	$0.070
Mistral 7B Instruct	1M output tokens	$0.070
DeepSeek-R1 (671B)	1M input tokens	$3.00 Full distilled model
DeepSeek-R1 (671B)	1M output tokens	$7.00 Full distilled model
Llama 3.1 70B Instruct	1M input tokens	$0.600
Llama 3.1 70B Instruct	1M output tokens	$0.600

Rates approximate — verify at lepton.ai/pricing
OpenAI-compatible endpoint: api.lepton.ai/api/v1

GPU Cloud

Model	Unit	Rate
A10G GPU (24GB)	second	$0.00021 ~$0.75/hr
A100 80GB	second	$0.00056 ~$2.00/hr
H100 80GB	second	$0.00111 ~$4.00/hr

GPU instance pricing approximate — verify at lepton.ai/pricing
Price expressed per second; multiply by 3600 for hourly equivalent

Compare Lepton AI vs Alternatives

Before committing to Lepton AI, compare pricing with these 3 alternatives in the same category.

VSCloudflare Workers AI

From $0.05/per million tokens

Prototyping + low-volume production at the edge

Full comparison

VSMistral AI API

From $0.1/per million tokens

Evaluation and prototyping

Full comparison

VSCohere API

From $0.037/per million tokens

Evaluation and prototyping

Full comparison

All Lepton AI alternatives & migration guides

How Lepton AI Pricing Compares

Software	Starting Price	Top Price
Lepton AI	$0.07/per million tokens	$4/per million tokens
Amazon Bedrock	$0.07/per million tokens	$75/per million tokens
Anyscale	$0.15/per million tokens	$5/per million tokens
Baidu ERNIE API	$0.1/per million tokens	$10/per million tokens
Cerebras Inference API	$0.1/per million tokens	$6/per million tokens
Claude API	$0.03/per million tokens	$75/per million tokens

Detailed pricing comparisons:

Browse all LLM API Providers pricing →

2 Lepton AI Hidden Costs Beyond the List Price

Beyond the listed price, Lepton AI has at least 2 documented hidden costs that can significantly increase total cost of ownership.

Watch for 2 hidden costs

Cold start latency on serverless endpoints billed
Egress fees may apply for large data transfers

Tip

Ask your Lepton AI sales rep about these costs upfront. Getting them in writing before signing can save you from surprise charges later.

Full hidden costs breakdown →

Lepton AI Pricing FAQ

01 How much does Lepton AI cost?

Lepton AI serverless inference starts at $0.07 per million tokens for small models like Llama 3.1 8B. GPU cloud instances start at approximately $0.75/hr for A10G. Pricing is pay-as-you-go with no minimum commitments.

02 What models does Lepton AI support?

Lepton AI supports Llama 3.x (8B to 70B), Mistral models, DeepSeek R1, and other popular open-source models through its serverless inference API. Custom model deployment is available on GPU cloud instances.

03 Is Lepton AI OpenAI-compatible?

Yes, Lepton AI's inference API is OpenAI-compatible. Point your OpenAI SDK to api.lepton.ai/api/v1 to use it as a drop-in replacement for compatible models.

04 Does Lepton AI have a free tier?

Lepton AI offers a free tier with limited credits for new accounts. Check lepton.ai for current free tier details and credit amounts.

Is this pricing incorrect? — we'll verify and update it.