Groq Models

Groq is known for its hardware-accelerated inferencing, offering blazing-fast response times ideal for latency-sensitive applications. The models served through Groq in Lyzr come pre-integrated and require no additional setup.

LLaMA 3.3 70B Versatile

A state-of-the-art LLaMA model served on Groq’s hardware, optimized for general-purpose use with high speed.Use Cases:

High-speed chat agents
Real-time customer interaction bots
Lightweight RAG-based knowledge assistants

Highlights:

Ultra-fast response time (token streaming in milliseconds)
Balanced accuracy and generation speed
Great for user-facing experiences

LLaMA 3.1 8B Instant

Smaller LLaMA model optimized for extremely lightweight and instant responses.Use Cases:

Instant query resolution
Embedded LLM features in apps
Low-cost multi-turn agents

Highlights:

Minimal latency (ideal for mobile/web)
Lightweight for cost-effective scaling
Suitable for basic inferencing needs

Mistral 8x7B 32k

A mixture-of-experts Mistral model capable of handling large contexts with top-tier throughput on Groq.Use Cases:

Long-context RAG pipelines
Multi-turn conversations
Structured Q&A with deep memory

Highlights:

Handles up to 32k context
Excellent performance with retrieval
Useful in traceable enterprise workflows

DeepSeek R1 Distill Qwen 32B

A distilled version of DeepSeek-Qwen focused on multilingual and general-purpose reasoning.Use Cases:

Multilingual bots
Exploratory assistants
Agents requiring broad knowledge bases

Highlights:

Multilingual capabilities
Efficient reasoning and summarization
Suitable for global applications

DeepSeek R1 Distill LLaMA 70B SpecDec

A distilled LLaMA variant optimized for specific decision-making flows (SpecDec) served via Groq.Use Cases:

Decision support agents
Evaluation-driven orchestration
Agents requiring logic + policy-based outputs

Highlights:

Designed for decision-heavy flows
Combines speed with deep reasoning
Enterprise-grade inferencing under 100ms

GPT-oss-20B

An open-source GPT variant (20B parameters) optimized for balanced performance and efficiency on Groq hardware.Use Cases:

General-purpose conversational agents
Fast inference for customer support and FAQs
Lightweight reasoning at scale

Highlights:

Mid-sized open-source LLM
Optimized for Groq inferencing speed
Ideal for real-time interactive applications

GPT-oss-120B

A large-scale GPT-oss model (120B parameters) optimized for Groq, delivering deeper reasoning and broader coverage.Use Cases:

Knowledge-heavy assistants
Multi-turn conversational flows
Enterprise orchestration agents

Highlights:

Large-scale reasoning capabilities
Supports complex queries with high accuracy
Maintains sub-100ms latency under Groq runtime

⚡ With Groq, agents in Lyzr get sub-100ms latency inferencing, making it ideal for real-time apps where user experience and responsiveness are critical.

Getting Started

Using Lyzr

Models

Key Concepts

Community

Support

Account & Team Mangement

Agents

Groq Models