Home / Inference APIs / Compare

Inference APIs Pricing Comparison

49 providers compared by pricing model, free tiers, hosting options, and headquarters. Last updated March 2026.

37 with free tiers ยท 7 open source ยท 9 self-hostable ยท 14 European

Provider Pricing Model Starting Price Free Tier Hosting Open Source HQ
Pay-per-use Pay-per-token Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $1/1M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use ~$0.63/hr (T4 GPU) Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.15/hr (T4 GPU) Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States
๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium โ‚ฌ25/mo Cloud ๐Ÿ‡ธ๐Ÿ‡ช Sweden
Freemium Free tier available Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use ~$1.10/hr (A10 GPU) Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium $0.011/1K neurons Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium $0.04/1M tokens Cloud + Self-hosted ๐Ÿ‡จ๐Ÿ‡ฆ Canada
Pay-per-use $6.50/hr (GH200 GPU) Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use Pay-per-use + 5% gateway fee Cloud ๐Ÿ‡ฆ๐Ÿ‡น Austria
Pay-per-use $0.02/M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.028/1M tokens (cache hit) Cloud ๐Ÿ‡จ๐Ÿ‡ณ China
fal
Pay-per-use $0.02/megapixel Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.10/1M tokens Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.08/hr Cloud ๐Ÿ‡ฉ๐Ÿ‡ช Germany
Freemium Free Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium $0.05/1M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.033/hr (CPU) Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.15/hr Cloud ๐Ÿ‡ฌ๐Ÿ‡ง United Kingdom
Pay-per-use Free ๐Ÿ‡ฑ๐Ÿ‡บ Luxembourg
Pay-per-use $0.02/M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.58/GPU/hr (V100) Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium $3/300 credits Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium $0.10/1M tokens Cloud + Self-hosted ๐Ÿ‡ซ๐Ÿ‡ท France
Pay-per-use $30/mo free credits Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium Free ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $2.00/hr (H100) Cloud ๐Ÿ‡ณ๐Ÿ‡ฑ Netherlands
Pay-per-use $0.03/M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.01/M tokens Cloud ๐Ÿ‡ฌ๐Ÿ‡ง United Kingdom
๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium Free (open-source) Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.05/1M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium Free (25+ free models) Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.91/hr (L4 GPU) Cloud ๐Ÿ‡ซ๐Ÿ‡ท France
Freemium Free ๐Ÿ‡จ๐Ÿ‡ญ Switzerland
Pay-per-use Per-second GPU billing Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.06/hr Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium $5 free credit Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use โ‚ฌ0.20/M tokens Cloud ๐Ÿ‡ซ๐Ÿ‡ท France
Pay-per-use $0.0015/image Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use ~โ‚ฌ2.70/GPU-hr Cloud ๐Ÿ‡ฉ๐Ÿ‡ช Germany
Pay-per-use Pay-per-token Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use ~$0.06/GPU/hr Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.14/hr Cloud ๐Ÿ‡ซ๐Ÿ‡ฎ Finland
Free Free (open-source) Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States
ℹ️ Pricing units vary by provider type: per-token for LLM APIs, per-GPU-hour for compute platforms, per-request for media generation. Verify current rates on each provider's website.

Providers with free tiers

These inference apis providers offer free credits, free tiers, or open-source self-hosting options to get started without upfront costs.

Managed API access to foundation models on AWS with built-in fine-tuning and ...

From: Pay-per-token

Claude API for building AI applications with Opus, Sonnet, and Haiku models

From: $1/1M tokens

AI inference platform for deploying and serving ML models with autoscaling an...

From: ~$0.63/hr (T4 GPU)

Open-source serverless GPU cloud with sub-second cold starts and auto-scaling

From: $0.15/hr (T4 GPU)

BentoML is the platform for software engineers to build AI products.

EU-sovereign AI inference platform with OpenAI-compatible API

From: โ‚ฌ25/mo

Show all 37 providers with free tiers

Ultra-fast inference on custom wafer-scale hardware with OpenAI-compatible API

From: Free tier available

Serverless GPU infrastructure for deploying AI models with sub-5 second cold ...

From: ~$1.10/hr (A10 GPU)

Run AI models at the edge on Cloudflare's global network with serverless infe...

From: $0.011/1K neurons

Cohereโ€™s world-class LLMs help enterprises build powerful, secure application...

From: $0.04/1M tokens

European AI inference gateway with smart routing across EU providers

From: Pay-per-use + 5% gateway fee

Run the top AI models using a simple API, pay per use. Low cost, scalable and...

From: $0.02/M tokens

Cost-effective inference API with OpenAI-compatible endpoints and open-weight...

From: $0.028/1M tokens (cache hit)

fal

Build the next generation of creativity with fal. Lightning fast inference.

From: $0.02/megapixel

European GPU cloud for AI training and inference powered by 100% green energy

From: $0.08/hr

Google's API for Gemini models with text, image, video, and audio capabilities

From: Free

Groq is on a mission to set the standard for GenAI inference speed, helping r...

From: $0.05/1M tokens

The open-source AI platform with 500K+ models, inference endpoints, and fine-...

From: $0.033/hr (CPU)

European sovereign AI inference with OpenAI-compatible APIs hosted in EU data...

From: Free

High-throughput inference API with OpenAI-compatible access to open-source mo...

From: $0.02/M tokens

GPU cloud for AI training and inference with on-demand and cluster options

From: $0.58/GPU/hr (V100)

Multi-LLM API orchestration platform for comparing and blending AI models

From: $3/300 credits

Run generative AI models, large-scale batch jobs, job queues, and much more.

From: $30/mo free credits

We rebuilt the modern AI software stack, from the ground up, to boost any AI ...

From: Free

Access, finetune, deploy LLMs using our affordable and scalable APIs.

Full-stack AI cloud with GPU infrastructure for training and inference

From: $2.00/hr (H100)

APIs, Serverless and GPU Instance In One AI Cloud

From: $0.03/M tokens

European AI hyperscaler with serverless inference and GPU cloud

From: $0.01/M tokens

OctoAI delivers production-grade GenAI solutions running on the most efficien...

Run large language models locally with a single command

From: Free (open-source)

API access to GPT, o-series reasoning, DALL-E, and Whisper models

From: $0.05/1M tokens

Unified API gateway for 300+ AI models across 60+ providers with automatic fa...

From: Free (25+ free models)

European cloud provider with AI inference, training, and deployment services

From: $0.91/hr (L4 GPU)

Fine-tune and deploy LLMs on your own infrastructure with full data sovereignty

From: Free

Custom AI chip inference platform with purpose-built hardware for high-throug...

From: $5 free credit

European serverless AI inference APIs, 100% hosted in Europe

From: โ‚ฌ0.20/M tokens

High-throughput LLM inference engine with PagedAttention for efficient GPU me...

From: Free (open-source)

How to choose an inference API provider

The right provider depends on workload type, latency requirements, and budget. Most providers use pay-per-token pricing for LLMs and per-second GPU billing for custom models. Token-based pricing varies by model, so the cheapest provider for one model may not be cheapest for another.

Free tiers are useful for prototyping but often come with rate limits. For production, compare per-token costs for your specific model, cold start latency, rate limits, and whether the provider supports the models you need.

Teams with data residency requirements should check hosting options and provider headquarters. European providers like Berget AI, cohere, Cortecs AI keep data within EU jurisdiction. See the full European AI Infrastructure directory. Self-hostable options like Baseten and Beam give full control over data location.

For a deeper analysis, read AI Inference API Providers Compared on the blog. Pricing changes frequently, so verify current rates on each provider's website. Submit a correction.

Browse all Inference APIs tools or explore the full AI Infrastructure Landscape.

Is your product missing? ๐Ÿ‘€ Add it here โ†’