Groq Models
Groq-hosted LLMs available in Lyzr with ultra-fast performance, ideal for real-time agent interactions
Groq is known for its hardware-accelerated inferencing, offering blazing-fast response times ideal for latency-sensitive applications. The models served through Groq in Lyzr come pre-integrated and require no additional setup.
LLaMA 3.3 70B Versatile
LLaMA 3.3 70B Versatile
A state-of-the-art LLaMA model served on Groq’s hardware, optimized for general-purpose use with high speed.
Use Cases:
- High-speed chat agents
- Real-time customer interaction bots
- Lightweight RAG-based knowledge assistants
Highlights:
- Ultra-fast response time (token streaming in milliseconds)
- Balanced accuracy and generation speed
- Great for user-facing experiences
LLaMA 3.1 8B Instant
LLaMA 3.1 8B Instant
Smaller LLaMA model optimized for extremely lightweight and instant responses.
Use Cases:
- Instant query resolution
- Embedded LLM features in apps
- Low-cost multi-turn agents
Highlights:
- Minimal latency (ideal for mobile/web)
- Lightweight for cost-effective scaling
- Suitable for basic inferencing needs
Mistral 8x7B 32k
Mistral 8x7B 32k
A mixture-of-experts Mistral model capable of handling large contexts with top-tier throughput on Groq.
Use Cases:
- Long-context RAG pipelines
- Multi-turn conversations
- Structured Q&A with deep memory
Highlights:
- Handles up to 32k context
- Excellent performance with retrieval
- Useful in traceable enterprise workflows
DeepSeek R1 Distill Qwen 32B
DeepSeek R1 Distill Qwen 32B
A distilled version of DeepSeek-Qwen focused on multilingual and general-purpose reasoning.
Use Cases:
- Multilingual bots
- Exploratory assistants
- Agents requiring broad knowledge bases
Highlights:
- Multilingual capabilities
- Efficient reasoning and summarization
- Suitable for global applications
DeepSeek R1 Distill LLaMA 70B SpecDec
DeepSeek R1 Distill LLaMA 70B SpecDec
A distilled LLaMA variant optimized for specific decision-making flows (SpecDec) served via Groq.
Use Cases:
- Decision support agents
- Evaluation-driven orchestration
- Agents requiring logic + policy-based outputs
Highlights:
- Designed for decision-heavy flows
- Combines speed with deep reasoning
- Enterprise-grade inferencing under 100ms
⚡ With Groq, agents in Lyzr get sub-100ms latency inferencing, making it ideal for real-time apps where user experience and responsiveness are critical.