Does vLLM have a free tier?

Yes, vLLM offers a free tier or free trial.

Yes, vLLM is open source (apache license).

vLLM

Open Source Free Trial

High-throughput LLM inference engine with PagedAttention for efficient GPU memory usage

vLLM is an open-source inference and serving engine for Large Language Models, originally developed at UC Berkeley. It uses PagedAttention to manage GPU memory efficiently, achieving up to 24x higher throughput compared to Hugging Face Transformers. It supports most popular open-source models including Llama, Mixtral, DeepSeek, and multimodal models like LLaVA. vLLM includes both a fast inference engine and a production-ready OpenAI-compatible serving server, making it a popular choice for self-hosted LLM deployments.

Pricing: Free

Hosting Self-hosted

Pricing Free, from Free (open-source)

HQ 🇺🇸 United States

Founded 2023

License APACHE

GitHub 80,987 stars

Visit website →

GitHub

Posts