Anyscale Pricing 2026: LLM Endpoints from $0.15/M Tokens

Price checkPer per million tokens

Anyscale EndpointsCustom Managed Ray ClustersCustom

All Anyscale Plans & Pricing

Plan	Monthly	Annual	Best For
Anyscale Endpoints freeCredit: $10 on sign-up	Custom	Not published	Teams needing scalable open-source LLM inference with Ray reliability
Verified pricing · last checked May 2026 · 1 source Get this price at Anyscale →
What's included at Anyscale Endpoints Best for: Teams needing scalable open-source LLM inference with Ray reliability OpenAI-compatible API Llama 3.x models (8B, 70B, 405B) Mistral and Mixtral models Gemma models No setup fees $10 free credits on sign-up Pay-as-you-go per token Limits freeCredit$10 on sign-up
Managed Ray Clusters	Contact Sales	Contact Sales	ML teams with complex distributed training and custom Ray workloads
Verified pricing · last checked May 2026 · 1 source Get this price at Anyscale →
What's included at Managed Ray Clusters Best for: ML teams with complex distributed training and custom Ray workloads Managed Ray clusters on AWS, GCP, Azure Distributed training and fine-tuning Ray Tune for hyperparameter search Ray Serve for model serving Enterprise SLAs Custom GPU configurations

View all features by plan (compare side-by-side)

Anyscale Endpoints

OpenAI-compatible API
Llama 3.x models (8B, 70B, 405B)
Mistral and Mixtral models
Gemma models
No setup fees
$10 free credits on sign-up
Pay-as-you-go per token

Managed Ray Clusters

Managed Ray clusters on AWS, GCP, Azure
Distributed training and fine-tuning
Ray Tune for hyperparameter search
Ray Serve for model serving
Enterprise SLAs
Custom GPU configurations

Pricing Alerts

Track Anyscale pricing

Get an email when Anyscale's pricing changes — plus the weekly SaaS Price Watch: verified price changes and deals across 3,000+ products. One-click unsubscribe.

See Anyscale Plans

Compare Anyscale with alternativesAdjust seats, lock a tier, add up to 2 more products side-by-side. Shareable URL.

Quick Answer

Last verified: May 5, 2026

Medium confidence

Anyscale costs $0.15 to $5 per per million tokens as of July 2026, with 2 plans available. Pricing depends on your chosen tier, contract length, and negotiated discounts.

Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.

Free tier: No free tier available

Anyscale offers 2 pricing tiers: Anyscale Endpoints, Managed Ray Clusters. The Managed Ray Clusters plan is ml teams with complex distributed training and custom ray workloads.

Compared to other llm api providers software, Anyscale is positioned at the budget-friendly price point.

7 documented hidden costs beyond list price

How much does Anyscale cost?

Anyscale pricing starts at $0.15/per million tokens across 2 plans, with enterprise pricing available on request. Plans include Anyscale Endpoints (custom pricing), Managed Ray Clusters (custom pricing).

Anyscale Pricing Overview

Anyscale has 2 pricing plans ranging from $0.15 to $5/per million tokens. The Anyscale Endpoints plan requires contacting sales for a custom quote and is designed for teams needing scalable open-source llm inference with ray reliability. The Managed Ray Clusters plan requires contacting sales for a custom quote and is designed for ml teams with complex distributed training and custom ray workloads.

There are at least 7 documented hidden costs beyond Anyscale's list price, including implementation, training, and add-on fees.

This pricing was last verified in May 5, 2026 from 1 independent source.

See Anyscale Plans

Anyscale is the commercial platform built on Ray, the distributed computing framework. It offers Anyscale Endpoints — a serverless LLM inference API with per-token pricing for open-source models — and managed Ray clusters for training and fine-tuning. Anyscale Endpoints are OpenAI-compatible, supporting Llama, Mistral, Mixtral, and other popular models. The platform is designed for teams that need both production-ready LLM APIs and the ability to run custom distributed workloads on Ray.

How Anyscale Pricing Compares

Compare Anyscale pricing against top alternatives in LLM API Providers.

Baidu ERNIE API $0.1-$10/per million tokens Compare → Cerebras Inference API $0.1-$6/per million tokens Compare → MiniMax API $0.2-$3/per million tokens Compare →

Usage-Based Rates

Per-unit pricing for Anyscale API usage.

Anyscale Endpoints

Model	Unit	Rate
Llama 3.1 8B Instruct	1M input tokens	$0.150
Llama 3.1 8B Instruct	1M output tokens	$0.150
Llama 3.1 70B Instruct	1M input tokens	$1.00
Llama 3.1 70B Instruct	1M output tokens	$1.00
Llama 3.1 405B Instruct	1M input tokens	$5.00
Llama 3.1 405B Instruct	1M output tokens	$5.00
Mixtral 8x7B Instruct	1M input tokens	$0.500
Mixtral 8x7B Instruct	1M output tokens	$0.500
Mistral 7B Instruct	1M input tokens	$0.150
Mistral 7B Instruct	1M output tokens	$0.150
Gemma 7B Instruct	1M input tokens	$0.150
Gemma 7B Instruct	1M output tokens	$0.150

$10 free credits on sign-up
Anyscale Endpoints may have been rebranded or updated — verify at anyscale.com/endpoints
Rates reflect 2024-era published pricing; confirm current rates before relying on these figures
Note: Anyscale pivoted more toward enterprise in 2025; public endpoint pricing may vary

Compare Anyscale vs Alternatives

Before committing to Anyscale, compare pricing with these 3 alternatives in the same category.

VSBaidu ERNIE API

From $0.1/per million tokens

China-market apps and Chinese-first workloads

Full comparison

VSCerebras Inference API

From $0.1/per million tokens

Testing Cerebras's unique speed advantage

Full comparison

VSMiniMax API

From $0.2/per million tokens

Long-context (1M tokens) and Chinese-language apps

Full comparison

All Anyscale alternatives & migration guides

What Companies Actually Pay for Anyscale

Review scores

Third-party review aggregates, as of Jul 2026

Top pricing complaints

a noticeable learning curve, particularly for teams unfamiliar with Ray conceptsa lack of transparency in pricing, which makes cost planning challengingdifficulty during debugging

How Anyscale Pricing Compares

Software	Starting Price	Top Price
Anyscale	$0.15/per million tokens	$5/per million tokens
Amazon Bedrock	$0.07/per million tokens	$75/per million tokens
Baidu ERNIE API	$0.1/per million tokens	$10/per million tokens
Cerebras Inference API	$0.1/per million tokens	$6/per million tokens
Claude API	$0.03/per million tokens	$75/per million tokens
Cloudflare Workers AI	Free	$4.881/per million tokens

Detailed pricing comparisons:

Browse all LLM API Providers pricing →

7 Anyscale Hidden Costs Beyond the List Price

Beyond the listed price, Anyscale has at least 7 documented hidden costs that can significantly increase total cost of ownership.

Watch for 7 hidden costs

Compute Costs (GPUs) AC 0.0135/hr to AC 4.9591/hr
high 1 source

industry "Anyscale claims to offer up to 10x more cost-effective solutions for open-source LLMs compared to proprietary solutions for general workloads, and up to 6x cost savings for batch LLM inference in shared prefix scenarios compared to AWS Bedrock."
Developer Time / Learning Curve
high 1 source

industry "Anyscale claims to offer up to 10x more cost-effective solutions for open-source LLMs compared to proprietary solutions for general workloads, and up to 6x cost savings for batch LLM inference in shared prefix scenarios compared to AWS Bedrock."
Idle Resources
medium 1 source

industry "Idle Resources: While Anyscale offers features like auto-suspension for clusters to prevent paying for idle resources, managing this effectively is crucial for cost control."
Data Egress and Storage
medium 1 source

industry "Data Egress and Storage: Although not explicitly detailed for Anyscale, general LLM API pricing issues suggest that published rates might not capture the total cost of ownership, potentially excluding data egress, storage, or premium features that..."
Unpredictable Usage
medium 1 source

industry "Unpredictable Usage: For usage-based models, fluctuating workloads can make precise budgeting challenging, as costs depend heavily on real-time resource consumption."
Fine-tuning Fee $5 per run
low 1 source

industry "Fine-tuning and Inference: Anyscale charges a static fee of $5 per run for fine-tuning, regardless of the training data size."
LLM Inference Costs $1 per million tokens
low 1 source

industry "For LLM inference, Anyscale Endpoints are offered at $1 per million tokens for models like Llama-2 70B, and even less for other models."

Tip

Ask your Anyscale sales rep about these costs upfront. Getting them in writing before signing can save you from surprise charges later.

Full hidden costs breakdown →

Intelligence sourced from 1 independent sources

industry

Key claims include inline source attribution. Data verified against multiple independent sources. 10 source citations total.

How to Negotiate Anyscale Pricing

Anyscale contracts are negotiable. These 3 tactics are sourced from real buyer experiences and procurement specialists.

Negotiation Playbook 3 tactics

Negotiate Volume Discounts high success

For committed contracts, buyers can negotiate for volume discounts by committing to a certain level of spending.

https://g2.com

Optimize Resource Utilization high success

Anyscale's platform allows for efficient GPU utilization, auto-scaling, and the use of spot instances which can be 50-80% cheaper.

https://lawinsider.com

Push for Pricing Transparency medium success

Buyers should push for detailed breakdowns of all potential costs, including those beyond direct compute, as Anyscale's pricing structure can feel unclear.

https://g2.com

Full negotiation guide →

Anyscale Pricing FAQ

01 How much do Anyscale Endpoints cost?

Anyscale Endpoints charge per token. Small models like Llama 3.1 8B and Mistral 7B cost $0.15 per million tokens (input and output same rate). Llama 3.1 70B costs $1.00/M tokens. Llama 3.1 405B costs $5.00/M tokens. New accounts get $10 in free credits.

02 What is Anyscale built on?

Anyscale is built on Ray, an open-source distributed computing framework developed at UC Berkeley. Anyscale provides the managed, enterprise version of Ray with production-grade SLAs, managed clusters, and hosted inference endpoints.

03 Does Anyscale have a free tier?

Anyscale gives new accounts $10 in free credits for Endpoints usage. There is no permanently free tier — after credits are used, standard per-token rates apply.

04 Anyscale vs Together AI: which should I use?

Together AI and Anyscale both offer OpenAI-compatible inference for open-source models. Together AI has broader model selection and slightly more competitive pricing for commodity models. Anyscale is better if you're already using Ray for training or need the Ray ecosystem integration.

Is this pricing incorrect? — we'll verify and update it.