Groq
LPU-powered inference API for LLMs, speech, and vision models with usage-based pricing
Groq runs inference on custom LPU (Language Processing Unit) chips designed from scratch for token generation. The hardware trades general-purpose flexibility for deterministic, low-latency performance on transformer workloads. GroqCloud exposes this through an OpenAI-compatible API supporting Llama 3.3 70B, Qwen3 32B, GPT-OSS 20B, Whisper, and several TTS models. Pricing is per-token with no subscriptions. Prompt caching and a batch API each offer 50% discounts. A free API key is available to get started. Enterprise customers can deploy on-premises via GroqRack. SOC 2, GDPR, and HIPAA compliant.
Pricing: Per token usage
Resources
Groq builds its own silicon, the LPU (Language Processing Unit), specifically for running inference on large language models. Unlike GPUs, which are general-purpose, the LPU is a fixed-function chip optimized for the sequential nature of autoregressive token generation. This gives Groq consistently high throughput, around 800-1,000 tokens per second on models like Llama 3.3 70B and GPT-OSS 20B.
The cloud API (GroqCloud) is OpenAI SDK-compatible. Supported model families include Llama 3.1/3.3, Qwen3, GPT-OSS, plus Whisper for speech-to-text and several TTS voices. Pricing is straightforward per-token with no subscriptions. Llama 3.1 8B runs at $0.05/M input tokens, while Llama 3.3 70B is $0.59/M input. Prompt caching and the batch API each cut costs by 50%.
Groq offers a free API key with rate limits for getting started. For enterprise, GroqRack provides on-premises deployment of the same LPU hardware. The platform is SOC 2, GDPR, and HIPAA compliant.
The main trade-off is model selection. Because Groq runs on custom hardware, only models that have been ported to the LPU are available. The catalog is smaller than GPU-based providers like Together AI or Fireworks, though it covers the most popular open-weight models. If you need a niche or fine-tuned model, Groq may not support it.
Groq Alternatives
Explore 69 products in the Inference APIs category. View all Groq alternatives.
Genesis Cloud
European GPU cloud for AI training and inference powered by 100% green energy
Nebius
Full-stack AI cloud with GPU infrastructure for training and inference
Compare
Is your product missing?