Every week, somewhere in a Slack channel or Discord server, a developer asks the same question: "Should I use ChromaDB, Pinecone, or pgvector?" The responses follow a predictable pattern. Pinecone advocates cite managed infrastructure. ChromaDB defenders point to local development simplicity. PostgreSQL loyalists invoke the gospel of "one less dependency."
They're all missing the point. And with context window sizes now reaching 200k tokens and beyond, the real bottleneck has shifted.
The vector database selection question has become a proxy war for larger architectural anxieties—anxieties that the choice of vector store won't actually resolve. The real challenge isn't picking the right database. It's understanding what you're actually building and whether your retrieval strategy deserves the complexity you're about to introduce.
How Vector Search Actually Works
Before comparing implementations, it helps to understand what these systems are doing under the hood.
Vector databases store embeddings—dense numerical representations of text, images, or other data—and enable similarity search across them. When you embed a query and search for the nearest neighbors, you're asking: "Which stored vectors are closest to this one in high-dimensional space?"
The naive approach computes distances to every stored vector. This scales linearly with dataset size and becomes prohibitively slow around 10,000-100,000 vectors, depending on dimensionality and hardware. Every serious vector store solves this with approximate nearest neighbor (ANN) algorithms that trade perfect accuracy for speed.
The dominant approaches:
HNSW (Hierarchical Navigable Small World) builds a multi-layer graph where each node connects to nearby vectors. Search starts at the top layer (sparse, long-range connections) and descends through increasingly dense layers, narrowing the search space at each level. This is what ChromaDB, pgvector, and many others use.
IVF (Inverted File Index) clusters vectors into buckets during indexing, then searches only the most relevant clusters at query time. Pinecone uses a proprietary variant of this approach.
Both methods achieve similar recall rates (typically 95-99% of true nearest neighbors) when properly tuned. The differences lie in memory usage, index build time, and operational characteristics—not retrieval quality.
The Managed vs. Self-Hosted Divide
Pinecone's value proposition is clear: you don't manage infrastructure. They handle scaling, replication, and the operational complexity that comes with distributed systems. For teams with limited DevOps capacity shipping user-facing AI products, this matters.
But "managed" comes with constraints. Vendor lock-in is real—Pinecone's API differs enough from alternatives that migration requires non-trivial refactoring. Costs scale with usage in ways that surprise teams accustomed to fixed infrastructure expenses. And network latency between your application and Pinecone's servers adds 50-200ms to every query, which compounds painfully when you're making multiple retrieval calls per user interaction.
pgvector takes the opposite approach: your vectors live in PostgreSQL, alongside your relational data. For applications already running Postgres, this means no new infrastructure, no additional authentication layer, and transactional consistency between your vectors and metadata. You can JOIN your similarity search results directly against user tables. The operational simplicity is substantial.
The tradeoff is that pgvector's performance ceiling is lower. Once you exceed a few million vectors, Postgres starts to strain. Specialized vector databases handle billion-scale indices that would crush a general-purpose relational database.
ChromaDB occupies the middle ground—a dedicated vector store that runs locally or self-hosted. It's become the default for prototyping because pip install chromadb gets you from zero to working retrieval in under five minutes. The embedded mode runs in-process, eliminating network overhead entirely. For development and small-to-medium scale production, ChromaDB offers an excellent balance of capability and simplicity.
Real Developer Scenarios
Where the abstract becomes concrete.
Scenario 1: The RAG prototype. A team building retrieval augmented generation over internal documentation. The corpus is 50,000 chunks, one developer, something working needed by end of week. ChromaDB is the obvious choice. Local, fast, no accounts to create, no infrastructure to provision. Migration can happen later.
Scenario 2: The production chat application. Thousands of concurrent users querying against millions of documents. Latency matters—every additional 100ms degrades user experience. The team needs geographic distribution and automatic failover but lacks dedicated infrastructure engineers. Pinecone (or Weaviate, or Qdrant's managed offering) starts making sense.
Scenario 3: The multi-tenant SaaS. An AI feature bolted onto an existing product already running PostgreSQL. User data lives in Postgres. Access control enforced at the database level. Adding Pinecone means syncing data between systems, implementing duplicate permission logic, and accepting eventual consistency between the source of truth and the vector index. pgvector keeps everything in one place, even if it means accepting performance constraints.
Scenario 4: The semantic search engine. One hundred million documents, sub-50ms latency requirements. None of the options above work without significant engineering investment. Custom deployments, specialized hardware, distributed systems expertise. This is where Milvus, Vespa, or custom solutions enter the conversation.
The point is that context determines the right choice. There's no universally correct vector database any more than there's a universally correct programming language.
What the Vendors Won't Tell You
The uncomfortable truth: for most RAG applications, the vector database is not the bottleneck.
The embedding model matters more than the storage layer. The difference between a generic embedding model and one fine-tuned for a specific domain often exceeds the difference between any two vector databases. Off-the-shelf embeddings on specialized content leave retrieval quality on the table regardless of which vector store sits underneath. And with token budget constraints tightening as teams scale their AI features, poor embeddings mean wasted tokens on irrelevant retrievals.
Chunking strategy matters more than the storage layer. Semantic chunking—splitting documents along meaning boundaries rather than arbitrary token counts—affects retrieval quality more than index architecture. Chunk too small and context disappears. Chunk too large and irrelevant content dilutes the signal. The "lost in the middle" problem, where LLMs struggle to use information buried in long contexts, makes this worse: bad chunks don't just reduce recall, they actively degrade generation quality.
Reranking matters more than initial retrieval. A two-stage pipeline—fast vector search followed by a smaller, higher-quality reranking model—typically outperforms pouring resources into vector database optimization. Cohere's reranker, cross-encoders, or even an LLM-as-judge can transform mediocre retrieval into excellent results.
This doesn't mean vector database choice is irrelevant. At scale, operational characteristics diverge significantly. But developers often obsess over database selection while neglecting the retrieval pipeline components that actually determine output quality.
Where the Industry Is Headed
The vector database market is consolidating and commoditizing simultaneously.
On one front, traditional databases are adding vector capabilities. PostgreSQL has pgvector. Redis has RediSearch with vector similarity. Elasticsearch offers dense vector fields. MongoDB announced vector search. The "why add another database?" argument gets stronger as incumbents close the feature gap.
On another front, specialized vector databases are differentiating on features beyond pure retrieval. Weaviate integrates embedding generation. Pinecone is adding serverless architectures. Chroma is building developer tooling around the core database. The competition is moving from "best similarity search" to "best developer experience for AI applications."
A third trend worth watching: hybrid search combining keyword and vector approaches. Pure semantic search has blind spots—exact matches for product codes, numerical searches, boolean filters. The most effective retrieval systems blend traditional search with vector similarity, and databases that handle both natively (Elasticsearch, Vespa, increasingly others) have an architectural advantage.
The unsolved problems are more interesting than the solved ones. Multi-modal search—querying images with text, or videos with audio—remains early-stage in production systems. Real-time index updates at scale still require careful engineering. Cross-lingual retrieval has improved but isn't solved. Explainability—why did this result rank higher than that one?—is essentially non-existent in vector search.
The Honest Assessment
After working with all three major options—ChromaDB, Pinecone, pgvector—and several others, a pattern emerges.
Start with the simplest option that could possibly work. For most teams, that means ChromaDB locally or pgvector if Postgres is already in the stack. Migration later costs less than premature optimization now.
The retrieval pipeline deserves investment before the database does. Embedding quality, semantic chunking, reranking, and query expansion move the needle more than swapping one vector store for another. Context compression techniques—stripping irrelevant content before it ever hits the context window—often deliver more token efficiency gains than any database migration.
Vendor benchmarks deserve skepticism. They benchmark scenarios optimized for their architecture. Real workloads differ. If performance matters, benchmark actual queries against actual data.
Operational reality matters more than feature lists. Can the team debug this system at 3 AM when it breaks? Managed services earn their token cost when the answer is "no."
The vector store is plumbing. Important plumbing, but plumbing nonetheless. The value lives in what gets built on top of it. Pick something reasonable, ship the product, and optimize when evidence points to what needs optimizing.
That evidence rarely points to the database.