Table of Contents
ToggleAI agents today are processing millions of queries, managing large-scale knowledge bases, and operating under intense concurrency demands.
For agent builder platforms, which supports 100+ live AI agents across industries, choosing the right vector database is critical.
At scale, even slight inefficiencies in search latency or indexing can impact agent performance and user experience. To meet production demands we need a solution that could deliver both speed and cost efficiency, without compromising reliability.
Today, you’ll find out how Lyzr Agent Studio optimized its vector search stack with Qdrant. We’ll explore why it was the right fit, how it elevated system performance, and the real-world impact across customer use cases.
Initial Setup with Weaviate & other vector databases
In the initial phase, Lyzr Agent Studio integrated Weaviate as the core vector database, along with experiments on other platforms like Pinecone to benchmark early-stage performance.
The system was designed to handle a knowledge base of approximately 1,000 to 1,500 entries, comprising a mix of short-form content, technical briefs, and structured records.
The setup operated under controlled development conditions:
Parameter | Details |
Deployment Type | Single-node or small-cluster (Weaviate and other vector db) |
Embedding Model | Sentence-transformer (768 dimensions) |
Concurrent Agents | 10 to 20 knowledge search agents |
Query Rate per Agent | 5-10 queries per minute |
Traffic Pattern | Steady, no significant spikes |
In this environment, Weaviate and Pinecone consistently performed well.
Query latency remained between 80ms to 150ms, and vector search results were highly relevant within the given domain context. Indexing of the dataset completed within hours, aided by Weaviate’s HNSW-based indexing and Pinecone’s managed vector infrastructure.
Performance remained optimal under:
- A limited corpus (sub-2,000 records).
- Moderate agent concurrency typical of early-stage validation.
- Query loads staying well within standard operational limits.
Challenges Faced During System Growth
As the system scaled, limitations with Weaviate and other databases like Pinecone began to surface.
The knowledge base expanded from around 1,500 to over 2,500 entries, while agent concurrency increased beyond 100 knowledge search agents. This added both volume and operational complexity, as agents were generating a much higher query load across the system.
Key issues encountered:
- Increased query latency: Query response times grew from an earlier average of 80-150ms to 300-500ms under higher load. With over 1,000 agent queries per minute during peak times, agents experienced noticeable delays in fetching relevant vectors.
- System under strain during concurrency spikes: Agents began facing slowdowns and occasional timeouts when retrieving search results, which disrupted downstream reasoning and decision-making flows.
- Indexing slowdowns: Scaling beyond 2,500 entries led to longer indexing and update times, especially during bulk ingestion. The re-indexing process consumed higher memory and CPU, causing performance drops during high activity periods.
Evaluation of Alternative Vector Databases
With growing data volume and rising agent concurrency, it became clear that a more scalable and efficient vector database was needed.
The goal was to find a solution that could handle heavier loads while maintaining fast response times and reducing operational overhead. This led to a thorough evaluation of alternative vector databases based on key performance criteria.
Criteria | Focus Area | Impact on System |
Scalability & Distributed Computing | Horizontal scaling, clustering | Support growing datasets and high agent concurrency |
Indexing Performance | Ingestion speed, update efficiency | Reduce downtime and enable faster bulk data updates |
Query Latency & Throughput | Search response under load | Ensure agents maintain fast, real-time responses |
Consistency & Reliability | Handling concurrency & failures | Avoid timeouts and failed queries during peak usage |
Resource Efficiency | CPU, memory, and storage usage | Optimize infrastructure costs while scaling workload |
Benchmark Results | Real-world load simulation | Validate sustained performance under >1,000 QPM loads |
Transition to Qdrant
At Lyzr Agent Studio, users can easily connect to a variety of vector databases to support advanced RAG (Retrieval-Augmented Generation) and agent workflows.
The platform offers native integrations with leading vector databases, enabling efficient storage, retrieval, and querying of embeddings.
But after evaluating scalability challenges with earlier implementations, Qdrant was chosen as the core vector database to power storage and search capabilities within Lyzr Agent Studio, delivering greater reliability and faster performance at scale.
Why Qdrant?
- Sustained performance at scale: Qdrant demonstrated the ability to maintain high request throughput and consistently low query latency, even under demanding workloads generated by hundreds of concurrent agents.
- Optimized vector indexing and retrieval: Leveraging HNSW-based indexing, Qdrant enabled faster ingestion of data and quicker nearest neighbor searches, while supporting incremental updates without disrupting live operations.
- Resource and cost optimization: By requiring fewer compute and memory resources to achieve similar or improved performance compared to earlier solutions, Qdrant offered a more cost-effective scaling strategy as data volume and agent demands increased.
Use Case 1: NTT Data – Change Request Management Agent
In an enterprise environment with NTT Data, the Change Request Management Agent was initially powered by Cosmos DB within Azure infrastructure.
Tech Stack:
- Hosting: Azure Cloud (enterprise-grade environment)
- Vector Database: Initially Cosmos DB with Azure, later migrated to Qdrant for improved vector search performance
- Agents: Change Request Management Agent (deployed for automating and managing IT change requests)
- Framework: react for frontend, fastapi for backend.
While Cosmos DB provided ease of integration with the existing Azure ecosystem, it exhibited limitations in vector search accuracy. The available indexing techniques restricted the agent’s ability to surface highly relevant information for complex change request workflows.
The transition to Qdrant addressed these challenges. With Qdrant’s advanced vector indexing capabilities and support for scalable deployments, the agent achieved:
- Higher search accuracy on change request records.
- Improved reliability across large volumes of historical and live data.
- Simplified horizontal scaling, enabling smooth handling of increased agent activity as project demands grew.
Use Case 2: NPD – Customer Support Agent
NPD deployed a customer support agent using Lyzr Agent Studio to automate troubleshooting assistance on its website. The agent leverages a knowledge base containing website URLs, troubleshooting documents, and product-specific resources.
How it works:
Users submit troubleshooting questions, and the agent responds with precise answers while directing them to the correct URL for the product or documentation they need.
Tech Stack:
- Agents
- Customer support agents – Help the user get to the right product URL as per their wants. Also troubleshoot their issues if they are facing any issue with existing product
- 6 agents for each website
- Knowledge Base
- 6 KB for 6 NPD websites
- Entire website scraped (URL, content)
- Vector DB – Qdrant
- Hosting
- Frontend – NPD’s own frontend
- Backend – Lyzr Studio (direct API call) – using a router to hide API KEY
- Modules
– Knowledge Base
– Short Term Memory
To meet the demands for fast, accurate query handling, the agent required a vector database optimized for contextual search and low-latency performance. Lyzr Agent Studio integrated Qdrant as the vector backend, delivering:
- High accuracy in matching user queries to the right URLs and support content.
- Fast vector search across thousands of knowledge base entries.
- Seamless scalability, allowing the agent to serve increasing user traffic without performance degradation.
Performance Gains and Results with Qdrant
The transition to Qdrant resulted in measurable improvements across multiple operational parameters for Lyzr Agent Studio.
1.Performance Comparison
Metric | Weaviate | Pinecode | Qdrant |
Avg Query Latency (at 100 agents) | 300-500ms | 250-450ms | 20-50ms (P99) |
Indexing Time (2,500+ entries) | ~3 hours | ~2.5 hours | ~1.5 hours |
Query Throughput (QPS) | ~80 QPS | ~100 QPS | 250+ QPS |
Resource Utilization (CPU/Memory) | High | Medium-High | Low-Medium |
Horizontal Scalability | Moderate | Moderate | Highly Scalable |
2. Latency Improvements
- Achieved P99 latency of 20ms even with over 1 million vectors in production.
- Query performance remained consistent during peak workloads, reducing prior spikes in response times experienced with other vector databases.
3. Throughput and Scaling Success
- Sustained handling of 250+ queries per second across distributed agents.
- System stability remained intact under concurrent load from 100+ live knowledge search agents.
.4. Cost Reductions
- Reduced compute and storage requirements led to an overall 30% decrease in infrastructure costs compared to earlier deployments.
- Improved index efficiency allowed for a leaner cluster footprint without sacrificing performance.
5. Stability & Reliability
With over 1,000 queries per minute, Weaviate can slow down at times, Pinecone stays mostly stable but has occasional spikes, while Qdrant stays consistently stable. At 100+ live agents, both Weaviate and Pinecone face some delays, but Qdrant runs smoothly.
Wrapping Up
Using Qdrant has strengthened how vector search supports agents under heavy workloads. Faster search, greater stability, and easier scaling have all contributed to smoother agent performance across real-world applications.
The key takeaway? The right vector database is critical to keeping agents fast and reliable as data and usage grow.
With Qdrant as its core vector database, Lyzr Agent Studio continues to help teams build and operate AI agents at scale.
Ready to build? Try Lyzr Agent Studio today.
Book A Demo: Click Here
Join our Slack: Click Here
Link to our GitHub: Click Here