Scaling AI Agents with Qdrant at Lyzr Agent Studio

State of AI Agents 2025 report is out now!

Table of Contents

AI agents today are processing millions of queries, managing large-scale knowledge bases, and operating under intense concurrency demands.

For agent builder platforms, which supports 100+ live AI agents across industries, choosing the right vector database is critical.

At scale, even slight inefficiencies in search latency or indexing can impact agent performance and user experience. To meet production demands we need a solution that could deliver both speed and cost efficiency, without compromising reliability.

Today, you’ll find out how Lyzr Agent Studio optimized its vector search stack with Qdrant. We’ll explore why it was the right fit, how it elevated system performance, and the real-world impact across customer use cases.

Initial Setup with Weaviate & other vector databases

In the initial phase, Lyzr Agent Studio integrated Weaviate as the core vector database, along with experiments on other platforms like Pinecone to benchmark early-stage performance.

The system was designed to handle a knowledge base of approximately 1,000 to 1,500 entries, comprising a mix of short-form content, technical briefs, and structured records.

The setup operated under controlled development conditions:

Parameter	Details
Deployment Type	Single-node or small-cluster (Weaviate and other vector db)
Embedding Model	Sentence-transformer (768 dimensions)
Concurrent Agents	10 to 20 knowledge search agents
Query Rate per Agent	5-10 queries per minute
Traffic Pattern	Steady, no significant spikes

In this environment, Weaviate and Pinecone consistently performed well.

Query latency remained between 80ms to 150ms, and vector search results were highly relevant within the given domain context. Indexing of the dataset completed within hours, aided by Weaviate’s HNSW-based indexing and Pinecone’s managed vector infrastructure.

Performance remained optimal under:

A limited corpus (sub-2,000 records).
Moderate agent concurrency typical of early-stage validation.
Query loads staying well within standard operational limits.

Challenges Faced During System Growth

As the system scaled, limitations with Weaviate and other databases like Pinecone began to surface.

The knowledge base expanded from around 1,500 to over 2,500 entries, while agent concurrency increased beyond 100 knowledge search agents. This added both volume and operational complexity, as agents were generating a much higher query load across the system.

Key issues encountered:

Increased query latency: Query response times grew from an earlier average of 80-150ms to 300-500ms under higher load. With over 1,000 agent queries per minute during peak times, agents experienced noticeable delays in fetching relevant vectors.
System under strain during concurrency spikes: Agents began facing slowdowns and occasional timeouts when retrieving search results, which disrupted downstream reasoning and decision-making flows.
Indexing slowdowns: Scaling beyond 2,500 entries led to longer indexing and update times, especially during bulk ingestion. The re-indexing process consumed higher memory and CPU, causing performance drops during high activity periods.

Evaluation of Alternative Vector Databases

With growing data volume and rising agent concurrency, it became clear that a more scalable and efficient vector database was needed.

The goal was to find a solution that could handle heavier loads while maintaining fast response times and reducing operational overhead. This led to a thorough evaluation of alternative vector databases based on key performance criteria.

Criteria	Focus Area	Impact on System
Scalability & Distributed Computing	Horizontal scaling, clustering	Support growing datasets and high agent concurrency
Indexing Performance	Ingestion speed, update efficiency	Reduce downtime and enable faster bulk data updates
Query Latency & Throughput	Search response under load	Ensure agents maintain fast, real-time responses
Consistency & Reliability	Handling concurrency & failures	Avoid timeouts and failed queries during peak usage
Resource Efficiency	CPU, memory, and storage usage	Optimize infrastructure costs while scaling workload
Benchmark Results	Real-world load simulation	Validate sustained performance under >1,000 QPM loads

Transition to Qdrant

At Lyzr Agent Studio, users can easily connect to a variety of vector databases to support advanced RAG (Retrieval-Augmented Generation) and agent workflows.

AD 4nXeuBS8ua1RQn1zyPsG7uvWnGk8pkwASfnm3Z1DdoXoXCs5KqlZGXWzgxSscjEvyYN74HX7QDmsjY

The platform offers native integrations with leading vector databases, enabling efficient storage, retrieval, and querying of embeddings.

AD 4nXev99bnetmvvwpEmZiJrqyqMn90dc66jFfJ1oDVYmiMmdl6340iGRb7 k29zHSZmpstwewOw8CFONizy39VzzEDIHHgVFhvXSvWMFx3wWZTnUxLm5uZd4F3uwo Ww9wRncYMP M?key=iqYzqXgdrQN53KkPbWaSSWi9

But after evaluating scalability challenges with earlier implementations, Qdrant was chosen as the core vector database to power storage and search capabilities within Lyzr Agent Studio, delivering greater reliability and faster performance at scale.

AD 4nXe50SadH51QOUeVcFdlA1kNL8P5xO yuOHLt3ykHgPHGeGSCREXXu2q5zFzFLrXUnOMFMJeIsgnGZBdVOAPToQwG4TLq1RDkCsU01iuIeLVvXlZcVBX12aqjE2M3iOAPpJSkfhIFg?key=iqYzqXgdrQN53KkPbWaSSWi9

Why Qdrant?

Sustained performance at scale: Qdrant demonstrated the ability to maintain high request throughput and consistently low query latency, even under demanding workloads generated by hundreds of concurrent agents.
Optimized vector indexing and retrieval: Leveraging HNSW-based indexing, Qdrant enabled faster ingestion of data and quicker nearest neighbor searches, while supporting incremental updates without disrupting live operations.
Resource and cost optimization: By requiring fewer compute and memory resources to achieve similar or improved performance compared to earlier solutions, Qdrant offered a more cost-effective scaling strategy as data volume and agent demands increased.

Use Case 1: NTT Data – Change Request Management Agent

In an enterprise environment with NTT Data, the Change Request Management Agent was initially powered by Cosmos DB within Azure infrastructure.

Tech Stack:

Hosting: Azure Cloud (enterprise-grade environment)
Vector Database: Initially Cosmos DB with Azure, later migrated to Qdrant for improved vector search performance
Agents: Change Request Management Agent (deployed for automating and managing IT change requests)
Framework: react for frontend, fastapi for backend.

AD 4nXes66 yNilbV8TBlfoN2SizRk oKMjPrINwuqwBtP3HPZs4z8P53ZgTTLgY0KsPfraAsNox0SFB5DypzsT2sz1pQ kjGVCohlpHtLscae3WHMJgpFakWKj4Ab6VpwdnDbWmvr B6A?key=iqYzqXgdrQN53KkPbWaSSWi9

While Cosmos DB provided ease of integration with the existing Azure ecosystem, it exhibited limitations in vector search accuracy. The available indexing techniques restricted the agent’s ability to surface highly relevant information for complex change request workflows.

The transition to Qdrant addressed these challenges. With Qdrant’s advanced vector indexing capabilities and support for scalable deployments, the agent achieved:

Higher search accuracy on change request records.
Improved reliability across large volumes of historical and live data.
Simplified horizontal scaling, enabling smooth handling of increased agent activity as project demands grew.

Use Case 2: NPD – Customer Support Agent

NPD deployed a customer support agent using Lyzr Agent Studio to automate troubleshooting assistance on its website. The agent leverages a knowledge base containing website URLs, troubleshooting documents, and product-specific resources.

How it works:

Users submit troubleshooting questions, and the agent responds with precise answers while directing them to the correct URL for the product or documentation they need.

Tech Stack:

Agents

Customer support agents – Help the user get to the right product URL as per their wants. Also troubleshoot their issues if they are facing any issue with existing product
6 agents for each website

Knowledge Base

6 KB for 6 NPD websites
Entire website scraped (URL, content)

Vector DB – Qdrant
Hosting

Frontend – NPD’s own frontend
Backend – Lyzr Studio (direct API call) – using a router to hide API KEY

Modules
– Knowledge Base

– Short Term Memory

To meet the demands for fast, accurate query handling, the agent required a vector database optimized for contextual search and low-latency performance. Lyzr Agent Studio integrated Qdrant as the vector backend, delivering:

High accuracy in matching user queries to the right URLs and support content.
Fast vector search across thousands of knowledge base entries.
Seamless scalability, allowing the agent to serve increasing user traffic without performance degradation.

Performance Gains and Results with Qdrant

The transition to Qdrant resulted in measurable improvements across multiple operational parameters for Lyzr Agent Studio.

1.Performance Comparison

Metric	Weaviate	Pinecode	Qdrant
Avg Query Latency (at 100 agents)	300-500ms	250-450ms	20-50ms (P99)
Indexing Time (2,500+ entries)	~3 hours	~2.5 hours	~1.5 hours
Query Throughput (QPS)	~80 QPS	~100 QPS	250+ QPS
Resource Utilization (CPU/Memory)	High	Medium-High	Low-Medium
Horizontal Scalability	Moderate	Moderate	Highly Scalable

2. Latency Improvements

Achieved P99 latency of 20ms even with over 1 million vectors in production.
Query performance remained consistent during peak workloads, reducing prior spikes in response times experienced with other vector databases.

3. Throughput and Scaling Success

Sustained handling of 250+ queries per second across distributed agents.
System stability remained intact under concurrent load from 100+ live knowledge search agents.

.4. Cost Reductions

Reduced compute and storage requirements led to an overall 30% decrease in infrastructure costs compared to earlier deployments.
Improved index efficiency allowed for a leaner cluster footprint without sacrificing performance.

5. Stability & Reliability

With over 1,000 queries per minute, Weaviate can slow down at times, Pinecone stays mostly stable but has occasional spikes, while Qdrant stays consistently stable. At 100+ live agents, both Weaviate and Pinecone face some delays, but Qdrant runs smoothly.

Wrapping Up

Using Qdrant has strengthened how vector search supports agents under heavy workloads. Faster search, greater stability, and easier scaling have all contributed to smoother agent performance across real-world applications.

The key takeaway? The right vector database is critical to keeping agents fast and reliable as data and usage grow.

With Qdrant as its core vector database, Lyzr Agent Studio continues to help teams build and operate AI agents at scale.

Ready to build? Try Lyzr Agent Studio today.

What’s your Reaction?

Post Views: 245

Book A Demo: Click Here
Join our Slack: Click Here
Link to our GitHub: Click Here

Scaling AI Agents with Qdrant at Lyzr Agent Studio

Table of Contents

State of AI Agents 2025 report is out now!

Initial Setup with Weaviate & other vector databases

Challenges Faced During System Growth

Evaluation of Alternative Vector Databases

Transition to Qdrant

Why Qdrant?

Use Case 1: NTT Data – Change Request Management Agent

Use Case 2: NPD – Customer Support Agent

Performance Gains and Results with Qdrant

1.Performance Comparison

2. Latency Improvements

3. Throughput and Scaling Success

.4. Cost Reductions

5. Stability & Reliability

Wrapping Up

Enjoyed the blog? Share it—your good deed for the day!

Launch prototypes in minutes. Go production in hours.
No more chains. No more building blocks.

Join 13,376+ subscribers

Agents

Fundamentals

Playbooks

Scaling AI Agents with Qdrant at Lyzr Agent Studio

Table of Contents

State of AI Agents 2025 report is out now!

Initial Setup with Weaviate & other vector databases

Challenges Faced During System Growth

Evaluation of Alternative Vector Databases

Transition to Qdrant

Why Qdrant?

Use Case 1: NTT Data – Change Request Management Agent

Use Case 2: NPD – Customer Support Agent

Performance Gains and Results with Qdrant

1.Performance Comparison

2. Latency Improvements

3. Throughput and Scaling Success

.4. Cost Reductions

5. Stability & Reliability

Wrapping Up

Enjoyed the blog? Share it—your good deed for the day!

Launch prototypes in minutes. Go production in hours. No more chains. No more building blocks.

Join 13,376+ subscribers

Agents

Fundamentals

Playbooks

Launch prototypes in minutes. Go production in hours.
No more chains. No more building blocks.