Skip to main content
Groq is known for its hardware-accelerated inferencing, offering blazing-fast response times ideal for latency-sensitive applications. The models served through Groq in Lyzr come pre-integrated and require no additional setup.
A state-of-the-art LLaMA model served on Groq’s hardware, optimized for general-purpose use with high speed.Use Cases:
  • High-speed chat agents
  • Real-time customer interaction bots
  • Lightweight RAG-based knowledge assistants
Highlights:
  • Ultra-fast response time (token streaming in milliseconds)
  • Balanced accuracy and generation speed
  • Great for user-facing experiences
Smaller LLaMA model optimized for extremely lightweight and instant responses.Use Cases:
  • Instant query resolution
  • Embedded LLM features in apps
  • Low-cost multi-turn agents
Highlights:
  • Minimal latency (ideal for mobile/web)
  • Lightweight for cost-effective scaling
  • Suitable for basic inferencing needs
A mixture-of-experts Mistral model capable of handling large contexts with top-tier throughput on Groq.Use Cases:
  • Long-context RAG pipelines
  • Multi-turn conversations
  • Structured Q&A with deep memory
Highlights:
  • Handles up to 32k context
  • Excellent performance with retrieval
  • Useful in traceable enterprise workflows
A distilled version of DeepSeek-Qwen focused on multilingual and general-purpose reasoning.Use Cases:
  • Multilingual bots
  • Exploratory assistants
  • Agents requiring broad knowledge bases
Highlights:
  • Multilingual capabilities
  • Efficient reasoning and summarization
  • Suitable for global applications
A distilled LLaMA variant optimized for specific decision-making flows (SpecDec) served via Groq.Use Cases:
  • Decision support agents
  • Evaluation-driven orchestration
  • Agents requiring logic + policy-based outputs
Highlights:
  • Designed for decision-heavy flows
  • Combines speed with deep reasoning
  • Enterprise-grade inferencing under 100ms
An open-source GPT variant (20B parameters) optimized for balanced performance and efficiency on Groq hardware.Use Cases:
  • General-purpose conversational agents
  • Fast inference for customer support and FAQs
  • Lightweight reasoning at scale
Highlights:
  • Mid-sized open-source LLM
  • Optimized for Groq inferencing speed
  • Ideal for real-time interactive applications
A large-scale GPT-oss model (120B parameters) optimized for Groq, delivering deeper reasoning and broader coverage.Use Cases:
  • Knowledge-heavy assistants
  • Multi-turn conversational flows
  • Enterprise orchestration agents
Highlights:
  • Large-scale reasoning capabilities
  • Supports complex queries with high accuracy
  • Maintains sub-100ms latency under Groq runtime
⚡ With Groq, agents in Lyzr get sub-100ms latency inferencing, making it ideal for real-time apps where user experience and responsiveness are critical.
I