Home / Inference APIs / Compare

Inference APIs Pricing Comparison

78 providers compared by pricing model, free tiers, hosting options, and headquarters. Last updated July 2026.

47 with free tiers · 7 open source · 8 self-hostable · 27 European

AI inference API providers are hosted services that run large language models behind an API, so you pay per token or per second instead of renting and managing GPUs yourself. The table below compares 78 of them on the axes that usually decide the choice: price per token, sustained throughput and first-token latency, the model catalog, OpenAI-compatibility, free tiers, and hosting region (27 run inside the EU). Cost-sensitive and high-throughput workloads tend to pull toward different providers, so the right pick depends on which of these matters most for your use case.

Provider	Pricing Model	Starting Price	Free Tier	Hosting	Open Source	HQ
Lyceum	—	From $0.13/1M tokens	✓	Cloud	—	🇩🇪 Germany
AKI.IO	Usage Based	Free (10 EUR credits)	✓	—	—	🇩🇪 Germany
ARK Labs	—	—	✓	—	—	—
AiQu	Subscription	49 SEK/month	✓	Cloud	—	🇸🇪 Sweden
Airon	Hourly	$1.24/hour	—	Cloud	—	🇸🇪 Sweden
Amazon Bedrock	Pay-per-use	Pay-per-token	✓	Cloud	—	🇺🇸 United States
Anthropic Claude	Pay-per-use	$1/1M tokens	✓	Cloud	—	🇺🇸 United States
Anyscale	—	—	—	—	—	🇺🇸 United States
Baseten	Pay-per-use	~$0.63/hr (T4 GPU)	✓	Cloud + Self-hosted	—	🇺🇸 United States
Beam	Pay-per-use	$0.15/hr (T4 GPU)	✓	Cloud + Self-hosted	✓	🇺🇸 United States
BentoML	—	—	✓	—	✓	🇺🇸 United States
Berget AI	Freemium	€25/mo	✓	Cloud	—	🇸🇪 Sweden
Cerebras	Freemium	Free tier available	✓	Cloud	—	🇺🇸 United States
Cerebrium	Pay-per-use	~$1.10/hr (A10 GPU)	✓	Cloud	—	🇺🇸 United States
CheapestInference	Subscription	$6.99/mo (Core pool)	—	—	—	—
Cloudflare Workers AI	Freemium	$0.011/1K neurons	✓	Cloud	—	🇺🇸 United States
CodingPlanX	—	—	✓	Cloud	—	—
CoreWeave	Pay-per-use	$6.50/hr (GH200 GPU)	—	Cloud	—	🇺🇸 United States
Cortecs AI	Pay-per-use	Pay-per-use + 5% gateway fee	✓	Cloud	—	🇦🇹 Austria
DeepSeek	Pay-per-use	$0.028/1M tokens (cache hit)	✓	Cloud	✓	🇨🇳 China
EUrouter	Subscription + Usage	Free (1K req/mo), 39 EUR/mo (Plus)	—	—	—	🇳🇱 Netherlands
Fast Pivot	—	—	—	—	—	—
FerryAPI	—	—	—	—	—	—
General Compute	—	—	—	Cloud	—	🇺🇸 United States
Genesis Cloud	Pay-per-use	$0.08/hr	✓	Cloud	—	🇩🇪 Germany
Geodd	—	—	—	Cloud	—	🇺🇸 United States
Google Gemini API	Freemium	Free	✓	Cloud	—	🇺🇸 United States
Groq	Freemium	$0.05/1M tokens	✓	Cloud	—	🇺🇸 United States
Hyperstack	Pay-per-use	$0.15/hr	—	Cloud	—	🇬🇧 United Kingdom
IONOS AI Model Hub	—	From $0.17/1M tokens (Llama 3.1 8B)	—	—	—	🇩🇪 Germany
Infercom	Pay-per-use	Free	✓	—	—	🇱🇺 Luxembourg
IonRouter	Pay-per-use	$0.02/M tokens	✓	Cloud	—	🇺🇸 United States
Jina AI	Usage Based	Free (10M tokens)	✓	—	—	🇩🇪 Germany
L LLMBase	—	—	—	—	—	—
LLMWise	Freemium	$3/300 credits	✓	Cloud	—	🇺🇸 United States
Lambda	Pay-per-use	$0.58/GPU/hr (V100)	✓	Cloud	—	🇺🇸 United States
LibertAI	—	—	—	—	—	🇫🇷 France
Miapi	—	—	✓	—	—	🇬🇧 United Kingdom
Mistral	Freemium	$0.10/1M tokens	—	Cloud + Self-hosted	✓	🇫🇷 France
Modal	Pay-per-use	$30/mo free credits	✓	Cloud	—	🇺🇸 United States
Monster API	Pay-per-use	—	✓	Cloud	—	🇺🇸 United States
Nebius	Pay-per-use	$2.00/hr (H100)	✓	Cloud	—	🇳🇱 Netherlands
Nscale	Pay-per-use	$0.01/M tokens	✓	Cloud	—	🇬🇧 United Kingdom
OVHcloud AI	Pay-per-use	$0.91/hr (L4 GPU)	✓	Cloud	—	🇫🇷 France
OctoAI	—	—	✓	—	—	🇺🇸 United States
OpenAI	Pay-per-use	$0.05/1M tokens	✓	Cloud	—	🇺🇸 United States
OpenRouter	Freemium	Free (25+ free models)	✓	Cloud	—	🇺🇸 United States
Opper	—	Provider token rates + 3% credit fee	—	—	—	🇸🇪 Sweden
OurToken	—	—	—	—	—	—
Packet.ai	Pay-per-use	$0.66/hr RTX PRO 6000 Blackwell	—	Cloud	—	🇸🇪 Sweden
Prem AI	Freemium	Free	✓	—	—	🇨🇭 Switzerland
Replicate	Pay-per-use	Per-second GPU billing	—	Cloud	—	🇺🇸 United States
Requesty	—	—	✓	—	—	—
RunPod	Pay-per-use	$0.06/hr	—	Cloud	—	🇺🇸 United States
SGLang	Open Source	—	—	—	✓	—
SambaNova	Freemium	$5 free credit	✓	Cloud + Self-hosted	—	🇺🇸 United States
Scaleway	Pay-per-use	€0.20/M tokens	✓	Cloud	—	🇫🇷 France
SiliconFlow	Pay-per-use	—	✓	—	—	—
Synexa	Pay-per-use	$0.0015/image	—	Cloud	—	🇺🇸 United States
Taiga Cloud	Pay-per-use	~€2.70/GPU-hr	—	Cloud	—	🇩🇪 Germany
Tensorix	Usage Based	—	—	—	—	🇮🇪 Ireland
Theta EdgeCloud	—	—	—	—	✓	—
TokensMind	—	—	✓	—	—	—
Tokenware	—	—	✓	—	—	—
Vast.ai	Pay-per-use	~$0.06/GPU/hr	—	Cloud	—	🇺🇸 United States
Vercel AI Gateway	—	—	✓	Cloud	—	🇺🇸 United States
Verda	Pay-per-use	$0.14/hr	—	Cloud	—	🇫🇮 Finland
Voyage AI	—	—	✓	—	—	🇺🇸 United States
WAYSCloud	—	—	—	—	—	🇳🇴 Norway
cohere	Freemium	$0.04/1M tokens	✓	Cloud + Self-hosted	—	🇨🇦 Canada
deepinfra	Pay-per-use	$0.02/M tokens	✓	Cloud	—	🇺🇸 United States
evroc	Contact Sales	—	—	—	—	🇸🇪 Sweden
fal	Pay-per-use	$0.02/megapixel	✓	Cloud	—	🇺🇸 United States
fireworks.ai	Pay-per-use	$0.10/1M tokens	—	Cloud + Self-hosted	—	🇺🇸 United States
novita.ai	Pay-per-use	$0.03/M tokens	✓	Cloud	—	🇺🇸 United States
together.ai	Pay-per-use	Pay-per-token	—	Cloud + Self-hosted	—	🇺🇸 United States
vLLM	Free	Free (open-source)	✓	Self-hosted	✓	🇺🇸 United States
vMetal	—	—	—	—	—	—

ℹ️ Pricing units vary by provider type: per-token for LLM APIs, per-GPU-hour for compute platforms, per-request for media generation. Verify current rates on each provider's website.

Providers with free tiers

These inference apis providers offer free credits, free tiers, or open-source self-hosting options to get started without upfront costs.

Lyceum

EU-hosted inference cloud for open-source models, OpenAI-compatible

From: From $0.13/1M tokens

AKI.IO

European AI API for open-source models on EU infrastructure

From: Free (10 EUR credits)

ARK Labs

Sovereign AI inference infrastructure for regulated EU environments, with het...

AiQu

Swedish GPU infrastructure and LLM hosting platform with API-first deployment...

From: 49 SEK/month

Amazon Bedrock

Managed API access to foundation models on AWS with built-in fine-tuning and ...

From: Pay-per-token

Anthropic Claude

Claude API for building AI applications with Opus, Sonnet, and Haiku models

From: $1/1M tokens

Show all 47 providers with free tiers

Baseten

AI inference platform for deploying and serving ML models with autoscaling an...

From: ~$0.63/hr (T4 GPU)

Beam

Open-source serverless GPU cloud with sub-second cold starts and auto-scaling

From: $0.15/hr (T4 GPU)

BentoML

BentoML is the platform for software engineers to build AI products.

Berget AI

EU-sovereign AI inference platform with OpenAI-compatible API

From: €25/mo

Cerebras

Ultra-fast inference on custom wafer-scale hardware with OpenAI-compatible API

From: Free tier available

Cerebrium

Serverless GPU infrastructure for deploying AI models with sub-5 second cold ...

From: ~$1.10/hr (A10 GPU)

Cloudflare Workers AI

Run AI models at the edge on Cloudflare's global network with serverless infe...

From: $0.011/1K neurons

CodingPlanX

Unified AI API gateway providing access to 600+ models from OpenAI, Anthropic...

Cortecs AI

European AI inference gateway with smart routing across EU providers

From: Pay-per-use + 5% gateway fee

DeepSeek

Cost-effective inference API with OpenAI-compatible endpoints and open-weight...

From: $0.028/1M tokens (cache hit)

Genesis Cloud

European GPU cloud for AI training and inference powered by 100% green energy

From: $0.08/hr

Google Gemini API

Google's API for Gemini models with text, image, video, and audio capabilities

From: Free

Groq

LPU-powered inference API for LLMs, speech, and vision models with usage-base...

From: $0.05/1M tokens

Infercom

European sovereign AI inference with OpenAI-compatible APIs hosted in EU data...

From: Free

IonRouter

High-throughput inference API with OpenAI-compatible access to open-source mo...

From: $0.02/M tokens

Jina AI

Search APIs for embeddings, reranking, and web-to-markdown conversion

From: Free (10M tokens)

LLMWise

Multi-LLM API orchestration platform for comparing and blending AI models

From: $3/300 credits

Lambda

GPU cloud for AI training and inference with on-demand and cluster options

From: $0.58/GPU/hr (V100)

Miapi

Web-grounded AI answers API with citations, OpenAI-compatible, pay-per-query ...

Modal

Run generative AI models, large-scale batch jobs, job queues, and much more.

From: $30/mo free credits

Monster API

Access, finetune, deploy LLMs using our affordable and scalable APIs.

Nebius

Full-stack AI cloud with GPU infrastructure for training and inference

From: $2.00/hr (H100)

Nscale

European AI hyperscaler with serverless inference and GPU cloud

From: $0.01/M tokens

OVHcloud AI

European cloud provider with AI inference, training, and deployment services

From: $0.91/hr (L4 GPU)

OctoAI

OctoAI delivers production-grade GenAI solutions running on the most efficien...

OpenAI

API access to GPT, o-series reasoning, DALL-E, and Whisper models

From: $0.05/1M tokens

OpenRouter

Unified API for 400+ AI models across 60+ providers, OpenAI SDK-compatible, p...

From: Free (25+ free models)

Prem AI

Fine-tune and deploy LLMs on your own infrastructure with full data sovereignty

From: Free

Requesty

LLM gateway and router with one OpenAI-compatible API across 400+ models

SambaNova

Custom AI chip inference platform with purpose-built hardware for high-throug...

From: $5 free credit

Scaleway

European serverless AI inference APIs, 100% hosted in Europe

From: €0.20/M tokens

SiliconFlow

OpenAI-compatible API serving 200+ open-source LLM and multimodal models

TokensMind

Unified OpenAI-compatible API gateway to 100+ models across providers

Tokenware

Unified OpenAI-compatible API to 200+ models with smart routing and failover

Vercel AI Gateway

Unified API for hundreds of AI models, with built-in rate limiting and key ma...

Voyage AI

Embedding and reranker models for RAG retrieval quality, from MongoDB

cohere

Cohere’s world-class LLMs help enterprises build powerful, secure application...

From: $0.04/1M tokens

deepinfra

Run the top AI models using a simple API, pay per use. Low cost, scalable and...

From: $0.02/M tokens

fal

Build the next generation of creativity with fal. Lightning fast inference.

From: $0.02/megapixel

novita.ai

APIs, Serverless and GPU Instance In One AI Cloud

From: $0.03/M tokens

vLLM

High-throughput LLM inference engine with PagedAttention for efficient GPU me...

From: Free (open-source)

Frequently asked questions

What is the cheapest AI inference API?

Among the lowest-cost per-token providers for open-source models are DeepInfra, Scaleway, and Lyceum. On efficient open-weight models (Gemma, DeepSeek, GLM, MiniMax, Qwen), these price in the low cents per 1M tokens: DeepInfra lists several models from around $0.06-0.15 per 1M input tokens as of July 2026, Lyceum starts near $0.13 per 1M on Llama 3.3 70B, and Scaleway runs inference in the EU at comparable rates. Per-token pricing shifts month to month and varies by model, so the comparison table above is worth checking before committing to a provider for high-volume workloads.

What is the fastest AI inference API?

It depends on whether sustained throughput or first-token latency matters more. Cerebras reports around 3000 tokens/sec on gpt-oss-120B using WSE hardware, the highest measured throughput in the category as of April 2026. Groq uses custom LPU hardware and runs the same model at ~476 tokens/sec on Artificial Analysis, with a consistently low time-to-first-token (0.6-0.9s) that matters for interactive chat. Both trade off a narrower model catalog than GPU-based providers.

Which AI inference APIs offer a free tier?

Cerebras and Groq both offer free usage with daily token limits, useful for prototyping. Most of the serverless providers (DeepInfra, Together, Fireworks, Novita) hand out free credits on signup rather than a permanent free tier. The "Free tier" filter above lists every provider with a free option.

Which inference providers are OpenAI-compatible?

DeepInfra, Together.ai, Fireworks, Novita, OpenRouter, and Groq all expose a drop-in OpenAI-compatible endpoint. Switching between them usually means changing the base URL and API key, nothing more. Replicate uses its own API format, and raw GPU providers like RunPod and Modal are not endpoints at all, they host whatever gets deployed to them.

Are there EU-hosted AI inference APIs?

Yes. EU-headquartered, GDPR-compliant inference providers include Scaleway (France), Berget AI (Sweden), Cortecs AI (Austria), Infercom (Luxembourg), Tensorix (Ireland), EUrouter (Netherlands), and Lyceum (Germany), all serving inference from European data centres. Use the "European" filter above to see the full list, or visit the European providers page for hosting region details.

What is the best alternative to the OpenAI API?

For the highest sustained throughput on open-source models, Cerebras. For the lowest first-token latency, Groq. For the lowest cost per token, DeepInfra. For fine-tuning on the same platform as inference, Together.ai or Fireworks. For routing across providers from a single API, OpenRouter.

How to choose an inference API provider

The right provider depends on workload type, latency requirements, and budget. Most providers use pay-per-token pricing for LLMs and per-second GPU billing for custom models. Token-based pricing varies by model, so the cheapest provider for one model may not be cheapest for another.

Free tiers are useful for prototyping but often come with rate limits. For production, compare per-token costs for your specific model, cold start latency, rate limits, and whether the provider supports the models you need.

Teams with data residency requirements should check hosting options and provider headquarters. European providers like Lyceum, AKI.IO, AiQu keep data within EU jurisdiction. See the full European AI Infrastructure directory. Self-hostable options like Baseten and Beam give full control over data location.

For a deeper analysis, read AI Inference API Providers Compared on the blog. Pricing changes frequently, so verify current rates on each provider's website. Submit a correction.

See how these tools fit into a full stack

🚀 Indie & Early Startup Stack 🤖 AI Agent Stack 🖥️ Self-Hosted Stack

Browse all Inference APIs tools or explore the full AI Infrastructure Landscape.

Is your product missing?

Add it here →