deepinfra
Run the top AI models using a simple API, pay per use. Low cost, scalable and production ready infrastructure.
DeepInfra hosts 50+ open-source models (Llama 3, Mistral, Mixtral, Gemma, and others) as serverless API endpoints. You send requests to an OpenAI-compatible API and pay per token, with no setup or infrastructure management. The platform auto-scales based on demand.
DeepInfra competes primarily on price and latency, often offering lower per-token costs than other inference providers for the same models. They support text generation, embeddings, image generation, and text-to-speech. The OpenAI-compatible API makes it straightforward to switch from OpenAI by changing the base URL.
Pricing: Per token usage
What is DeepInfra?
DeepInfra is a serverless inference platform that hosts open-source AI models as API endpoints. You send requests to an OpenAI-compatible API and pay per token, with no infrastructure setup or management required.
Supported Models
DeepInfra hosts a wide catalog of open-source models including Meta's Llama 3 and Llama 3.1 (8B, 70B, 405B), Mistral and Mixtral models, Google's Gemma, Qwen, and others. The model list is updated regularly as new open-source models are released. Beyond text generation, DeepInfra supports embedding models, image generation (Stable Diffusion, FLUX), and text-to-speech.
API and Integration
The API is compatible with the OpenAI client libraries. If you're already using the OpenAI SDK, switching to DeepInfra requires changing the base URL and API key. The platform supports streaming responses, function calling, and JSON mode. There's also a dedicated Python client for more advanced use cases.
Pricing
DeepInfra uses per-token pricing, with rates varying by model. Smaller models like Llama 3 8B cost fractions of a cent per thousand tokens, while larger models like Llama 3.1 405B cost more but are still typically cheaper than comparable proprietary alternatives. There are no minimum commitments and you only pay for what you use.
Who should use DeepInfra?
DeepInfra is a good fit for teams that want to use open-source models without managing GPU infrastructure. The OpenAI-compatible API makes it easy to switch from OpenAI or test multiple models. It's particularly useful for cost-sensitive workloads where open-source models perform well enough for the task.
deepinfra Alternatives
Explore 50 products in the Inference APIs category. View all deepinfra alternatives.
Is your product missing? 👀 Add it here →