Home / Inference APIs / vLLM / Alternatives
Icon for vLLM

vLLM Alternatives

High-throughput LLM inference engine with PagedAttention for efficient GPU memory usage

vLLM is an open-source inference and serving engine for Large Language Models, originally developed at UC Berkeley.

Explore 50 alternatives to vLLM across 1 category. Each tool listed below shares at least one category with vLLM.

🤖 Inference APIs

Is your product missing?

Add it here →