"RAG Core Study (9/26) — Vector DB Showdown: FAISS / Chroma / Qdrant / Milvus / Weaviate / Pinecone / pgvector"
With the embedding chosen, the next decision is where to store the vectors.
Vector-DB choice is hard to undo. Index formats, APIs, and operational models all differ; migration means re-indexing. Part 9 compares 2026's seven mainstream choices — FAISS / Chroma / Qdrant / Milvus / Weaviate / Pinecone / pgvector — on four axes: ANN algorithm (HNSW/IVF), metadata-filter expressiveness, update/delete support, and operational cost. The crucial question is how Part 7's metadata design meets each DB's actual support.
0. Prerequisites
- Part 7 metadata — pre vs post filter, cardinality.
- Part 8 embeddings — dimension affects storage and latency.
- ANN (Approximate Nearest Neighbor) trades exact KNN for speed.
1. Learning Objectives
- State the one-line difference between HNSW, IVF, and FLAT.
- Read a table of when to pick which vector DB.
- Understand how metadata-filter expressiveness meets Part 7's design.
- Explain why update/delete is decisive in RAG operations.
2. ķµģ¬ ģģ½
A vector DB is ANN index + metadata filter + operational API. FAISS is a library — fast but not a DB. Chroma is the local-prototyping standard. Qdrant is the Rust-based self-host standard, with the strongest payload filters. Milvus is for large-scale distributed ops. Weaviate offers modular + GraphQL + RBAC. Pinecone is fully managed SaaS, the fastest setup. pgvector is a PostgreSQL extension, the most natural fit for existing DB shops. Three-line selection rule: managed + immediate → Pinecone, self-host + feature-rich → Qdrant or Milvus, existing Postgres + small scale → pgvector. HNSW is the default ANN; the decisive feature is whether the DB supports Filtered HNSW to integrate metadata filtering with search.
3. Intuition — Same Corpus, Seven Operations
The same 1M-chunk × 1024d corpus, indexed in seven tools, gives seven different setups for cost, filter support, and disaster recovery.
All seven can retrieve. The differences live in operations — who manages, how to back up and restore, and how expressive the filter language is.
4. Definitions — Seven Mainstream DBs (2026)
Filter expressiveness on a 0–3 scale — 0=none, 1=weak, 2=medium, 3=strong.
| DB | Form | ANN | Metadata filter (0-3) | Update/delete | License | Operation |
|---|---|---|---|---|---|---|
| FAISS | Library | HNSW, IVF, PQ | 0 (separate ID list) | ❌ (rebuild) | MIT | Embedded in code |
| Chroma | Embedded DB | HNSW | 1 (dict-based) | ✅ | Apache 2.0 | Local / self-host |
| Qdrant | Server | HNSW (Filtered) | 3 (payload, range/match) | ✅ | Apache 2.0 | Self-host or cloud |
| Milvus | Server (distributed) | HNSW, IVF, DiskANN | 2 (boolean, range) | ✅ | Apache 2.0 | Self-host (etcd + pulsar) |
| Weaviate | Server | HNSW | 3 (GraphQL, ABAC) | ✅ | BSD-3 | Self-host or cloud |
| Pinecone | SaaS | proprietary (HNSW-like) | 3 (mongo-style) | ✅ | Commercial | Fully managed |
| pgvector | Postgres extension | HNSW, IVF-flat | 3 (full SQL WHERE) | ✅ (SQL UPDATE/DELETE) | PostgreSQL | Added to existing Postgres |
Decisive axes — how managed the operation is and how expressive the metadata filter language is. FAISS is a library, so it lacks DB-grade features.
5. Math — ANN Trade-offs
FLAT (exact KNN):
$$\text{Query} = \mathcal{O}(N \cdot d), \quad \text{Build} = \mathcal{O}(N)$$
Fine for small N (< 100K); linear cost at scale.
HNSW (Hierarchical Navigable Small World):
$$\text{Query} \approx \mathcal{O}(\log N), \quad \text{Build} = \mathcal{O}(N \cdot M \cdot \log N), \quad \text{Memory} \approx N \cdot d \cdot 4 + N \cdot M \cdot 8$$
\(M\) = neighbours per node (typically 16–64). Achieves ≥ 95% recall at ms-scale latency.
IVF (Inverted File Index):
$$\text{Query} \approx \mathcal{O}(n_{\text{probe}} \cdot \frac{N}{n_{\text{list}}}), \quad \text{Build} = \mathcal{O}(N \cdot n_{\text{list}})$$
\(n_{\text{list}}\) = number of clusters, \(n_{\text{probe}}\) = clusters scanned per query. Disk-friendly, strong at scale (>100M).
Choice table:
| Index | Fit N | Recall | Latency | Memory |
|---|---|---|---|---|
| FLAT | < 100K | 100% | Slow | Low |
| HNSW | 100K – 100M | 95–99% | Fast | High (in-memory) |
| IVF + PQ | > 100M | 90–95% | Medium | Low (compressed) |
| DiskANN | > 1B | 90–97% | Medium | Low (SSD-based) |
Most RAG indices are < 100M chunks → HNSW is the default.
6. Walkthrough — Same Query Across Seven DBs
6.1 Chroma (local prototyping)
import chromadb
client = chromadb.PersistentClient(path="./chroma_db")
col = client.get_or_create_collection("rag", metadata={"hnsw:space": "cosine"})
col.add(
ids=["c001", "c002"],
embeddings=[emb1, emb2],
metadatas=[{"version": "3.2", "security_level": "internal"}, ...],
documents=[chunk1, chunk2],
)
results = col.query(
query_embeddings=[query_emb],
n_results=5,
where={"security_level": {"$in": ["public", "internal"]}, "version": "3.2"},
)
Pros: instant start. Cons: no distribution, latency degrades past 1M.
6.2 Qdrant (Rust server, top payload filters)
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, Filter, FieldCondition, MatchValue
client = QdrantClient("localhost", port=6333)
client.upsert(collection_name="rag", points=[
PointStruct(id=1, vector=emb1, payload={"version": "3.2", "security_level": "internal"}),
])
hits = client.search(
collection_name="rag",
query_vector=query_emb,
query_filter=Filter(must=[
FieldCondition(key="version", match=MatchValue(value="3.2")),
FieldCondition(key="security_level", match=MatchValue(value="internal")),
]),
limit=5,
)
Pros: Filtered HNSW — pre-filter operates together with ANN search. Range, nested, geo all supported.
6.3 Pinecone (SaaS, namespace split)
from pinecone import Pinecone
pc = Pinecone(api_key=API_KEY)
index = pc.Index("rag")
index.upsert(
vectors=[("c001", emb1, {"version": "3.2", "security_level": "internal"})],
namespace="acme", # tenant_id as namespace → strong isolation
)
results = index.query(
namespace="acme",
vector=query_emb,
top_k=5,
filter={"security_level": {"$in": ["public", "internal"]}, "version": "3.2"},
include_metadata=True,
)
Pros: managed instantly; namespaces for tenant isolation. Cons: cost, vendor lock-in.
6.4 pgvector (Postgres extension)
CREATE EXTENSION vector;
CREATE TABLE rag_chunks (
id TEXT PRIMARY KEY,
embedding vector(1024),
version TEXT,
security_level TEXT,
document_id TEXT,
chunk_text TEXT
);
CREATE INDEX ON rag_chunks USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON rag_chunks (security_level, version);
SELECT id, document_id, chunk_text,
1 - (embedding <=> query_emb) AS similarity
FROM rag_chunks
WHERE security_level IN ('public', 'internal')
AND version = '3.2'
ORDER BY embedding <=> query_emb
LIMIT 5;
Pros: full SQL — JOIN, transactions, backups, permissions — all existing Postgres. Cons: HNSW memory pressure at scale; tuning effort.
7. Variants
7.1 Filtered HNSW vs post-filter HNSW
- What changes: apply filters during ANN traversal (Qdrant, Weaviate) vs apply filters after ANN (FAISS, some Chroma setups).
- Why use it: with low selectivity (permissions), post-filter empties candidates. Filtered HNSW stays aware of the full search space and remains stable.
- What becomes possible: permission + RAG with empty rate near zero.
- Where it fits: multi-tenant, strict permissions, 4+ security tiers.
- Limits: heavier index build. Chroma lacks it; FAISS requires an external ID list.
7.2 IVF + PQ — compressed index for huge corpora
- What changes: PQ (Product Quantization) compresses vectors to 8–16 bytes.
- Why use it: 1B × 1024d ≈ 4 TB (float32) → 16 GB compressed; memory-fit.
- What becomes possible: very large search at sane cost.
- Where it fits: Milvus
IVF_PQ, FAISSIndexIVFPQ. > 100M chunks. - Limits: 5–10 pp recall loss. Compensate with a reranker (Part 13).
7.3 DiskANN — SSD-resident ANN
- What changes: index lives on SSD; traversal hits storage.
- Why use it: side-steps memory limits with SSD.
- What becomes possible: 1B+ chunks on a single node.
- Where it fits: Milvus DiskANN, OpenSearch DiskANN.
- Limits: SSD I/O-bound; NVMe required, HDD impossible.
7.4 pgvector + existing RDB integration
- What changes: vector results JOIN directly with source tables via SQL.
- Why use it: the most natural way to bolt RAG onto an existing OLTP system.
- What becomes possible: transactions, backups, permissions, monitoring — all Postgres standard.
- Where it fits: small scale (< 10M), existing Postgres ops.
- Limits: HNSW memory pressure; a separate read replica is recommended.
7.5 Hybrid serving — Pinecone + local evaluation
- What changes: production on Pinecone, evaluation/experimentation on local Chroma or Qdrant.
- Why use it: managed cost only where it matters.
- What becomes possible: fast experimentation alongside stable production.
- Where it fits: startups to mid-size.
- Limits: keeping embeddings and chunks in sync across environments.
8. Limits and Failure Modes
8.1 Persistence missing — the FAISS trap
- Why intrinsic: FAISS is a library — without
picklesave, the index vanishes on process exit. Part 7 metadata also lives separately. - Diagnosis: lifting a demo script into production loses the index on first restart.
- Mitigation: FAISS for prototypes only; Chroma or higher in production.
- Later part: Part 16 (experiment automation — FAISS for in-memory eval).
8.2 Missing update/delete — index rot
- Why intrinsic: forgetting to delete old chunks on document update leaves old + new co-indexed — same as Part 7 §8.4 (version inconsistency).
- Diagnosis: same
document_idwith differentversionappearing together in top-K. - Mitigation: atomic transactions (pgvector, Qdrant) or staged blue-green indices.
- Later part: Part 22 (RAG operations).
8.3 Cardinality blow-up — metadata index bloat
- Why intrinsic: indexing all high-cardinality fields makes the metadata index larger than the vectors.
- Diagnosis: abnormal build time, disk surge.
- Mitigation: choose indexed fields declaratively. Qdrant and Pinecone require explicit field indexing.
- Later part: Part 16 (cardinality monitoring).
8.4 Distributed-ops burden — the Milvus/Weaviate trap
- Why intrinsic: Milvus runs etcd + pulsar + MinIO + 5+ services. Without an ops team, maintenance itself is a burden.
- Diagnosis: small team self-hosting Milvus → higher incident rate and recovery time.
- Mitigation: under 1B chunks and SaaS allowed → Milvus Cloud or Pinecone; self-host only with dedicated ops.
- Later part: Part 22 (RAG ops cost).
8.5 Vendor lock-in — the Pinecone trap
- Why intrinsic: Pinecone's proprietary index format is hard to export. Migration = full re-index.
- Diagnosis: months-long migration when cost rises.
- Mitigation: back up raw embeddings in object storage — re-use them at migration to skip re-embedding.
- Later part: Part 16 (backup patterns).
8.5 Common Pitfalls
- "FAISS is fast, so production = FAISS." §8.1. No persistence, no filters, no ops.
- "Metadata filters are the same everywhere." Expressiveness varies from 1 to 3; verify Part 7's design.
- "Drop it into Pinecone and forget it." §8.5. Raw-embedding backups are mandatory.
- "Distributed = safer." §8.4. Without ops staff, distributed is less safe.
- "HNSW defaults are enough." \(M\),
ef_construction,ef_searchdecide recall and latency.
9. Settled Conclusions
Q1. Which of the seven is fully managed and immediate?
Pinecone — minutes to set up, zero infra to run. Chapter: §4, §7.5.
Q2. Which self-host DB has the strongest payload-filter expressiveness?
Qdrant — Rust-based, Filtered HNSW couples pre-filter with ANN, supports range and nested. Chapter: §4, §6.2.
Q3. State the HNSW vs IVF choice rule in one line.
\(N < 100M\) → HNSW (in-memory, fast); \(N > 100M\) → IVF+PQ or DiskANN (compressed / disk). Chapter: §5.
Q4. Why is Filtered HNSW decisive for permission filtering?
Post-filtering can leave the candidate set empty when all are out-of-bounds. Filtered HNSW applies filters during traversal, holding empty-rate near zero. Chapter: §7.1, Part 7 §8.1.
Q5. The simplest mitigation for vendor lock-in?
Back up raw embedding vectors in object storage; re-index by re-using them with the new DB. Chapter: §8.5.
10. Further Reading
Primary
- Malkov, Y., Yashunin, D. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. TPAMI 2018. arXiv:1603.09320.
- JƩgou, H. et al. Product Quantization for Nearest Neighbor Search. TPAMI 2011 (IVF+PQ foundation).
- Subramanya, S. J. et al. DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. NeurIPS 2019.
- Pinecone. HNSW Internals blog series (2024).
- Qdrant. Filtered HNSW: How payload filters integrate with ANN search (2023 blog).
Official docs
- FAISS:
https://github.com/facebookresearch/faiss/wiki - Chroma:
https://docs.trychroma.com/ - Qdrant:
https://qdrant.tech/documentation/ - Milvus:
https://milvus.io/docs - Weaviate:
https://weaviate.io/developers/weaviate - Pinecone:
https://docs.pinecone.io/ - pgvector:
https://github.com/pgvector/pgvector
Supporting
- Author note Chapter 8 — vector DBs.
- Author note Chapter 35 §6 — Index Partitioning, Multi-collection.
Cheat Sheet
| Scenario | First choice | Second | Note |
|---|---|---|---|
| Managed, immediate | Pinecone | Weaviate Cloud | namespace tenant split |
| Self-host, feature-rich | Qdrant | Weaviate | Filtered HNSW |
| Large-scale distributed | Milvus | Vespa | requires ops team |
| Existing Postgres integration | pgvector | — | < 10M chunks |
| Local prototyping | Chroma | FAISS | migrate before prod |
| RBAC + GraphQL | Weaviate | — | modular ML |
| Embedded SDK | Chroma | FAISS | library form |
Design rule of thumb: ops model (managed vs self-host) → scale → filter expressiveness → integration environment — narrow in that order.
Bridge — What's Next
Next — RAG Core Study (10/26) — Dense Retrieval Deep Dive.
With the vector DB chosen, the next question is what kind of retrieval runs inside it. Part 10 covers dense retrieval — bi-encoder, DPR (Karpukhin 2020), query–document asymmetry, top-K, and similarity functions — with formulas and code.
Series overview: Series index
ėźø
ėźø ģ°źø°