"RAG Core Study (9/26) — Vector DB Showdown: FAISS / Chroma / Qdrant / Milvus / Weaviate / Pinecone / pgvector"

With the embedding chosen, the next decision is where to store the vectors.

Vector-DB choice is hard to undo. Index formats, APIs, and operational models all differ; migration means re-indexing. Part 9 compares 2026's seven mainstream choices — FAISS / Chroma / Qdrant / Milvus / Weaviate / Pinecone / pgvector — on four axes: ANN algorithm (HNSW/IVF), metadata-filter expressiveness, update/delete support, and operational cost. The crucial question is how Part 7's metadata design meets each DB's actual support.


0. Prerequisites

  • Part 7 metadata — pre vs post filter, cardinality.
  • Part 8 embeddings — dimension affects storage and latency.
  • ANN (Approximate Nearest Neighbor) trades exact KNN for speed.

1. Learning Objectives

  1. State the one-line difference between HNSW, IVF, and FLAT.
  2. Read a table of when to pick which vector DB.
  3. Understand how metadata-filter expressiveness meets Part 7's design.
  4. Explain why update/delete is decisive in RAG operations.

2. 핵심 ģš”ģ•½

A vector DB is ANN index + metadata filter + operational API. FAISS is a library — fast but not a DB. Chroma is the local-prototyping standard. Qdrant is the Rust-based self-host standard, with the strongest payload filters. Milvus is for large-scale distributed ops. Weaviate offers modular + GraphQL + RBAC. Pinecone is fully managed SaaS, the fastest setup. pgvector is a PostgreSQL extension, the most natural fit for existing DB shops. Three-line selection rule: managed + immediate → Pinecone, self-host + feature-rich → Qdrant or Milvus, existing Postgres + small scale → pgvector. HNSW is the default ANN; the decisive feature is whether the DB supports Filtered HNSW to integrate metadata filtering with search.


3. Intuition — Same Corpus, Seven Operations

The same 1M-chunk × 1024d corpus, indexed in seven tools, gives seven different setups for cost, filter support, and disaster recovery.

diagram-1

All seven can retrieve. The differences live in operations — who manages, how to back up and restore, and how expressive the filter language is.


4. Definitions — Seven Mainstream DBs (2026)

Filter expressiveness on a 0–3 scale — 0=none, 1=weak, 2=medium, 3=strong.

DB Form ANN Metadata filter (0-3) Update/delete License Operation
FAISS Library HNSW, IVF, PQ 0 (separate ID list) ❌ (rebuild) MIT Embedded in code
Chroma Embedded DB HNSW 1 (dict-based) Apache 2.0 Local / self-host
Qdrant Server HNSW (Filtered) 3 (payload, range/match) Apache 2.0 Self-host or cloud
Milvus Server (distributed) HNSW, IVF, DiskANN 2 (boolean, range) Apache 2.0 Self-host (etcd + pulsar)
Weaviate Server HNSW 3 (GraphQL, ABAC) BSD-3 Self-host or cloud
Pinecone SaaS proprietary (HNSW-like) 3 (mongo-style) Commercial Fully managed
pgvector Postgres extension HNSW, IVF-flat 3 (full SQL WHERE) ✅ (SQL UPDATE/DELETE) PostgreSQL Added to existing Postgres

Decisive axes — how managed the operation is and how expressive the metadata filter language is. FAISS is a library, so it lacks DB-grade features.


5. Math — ANN Trade-offs

FLAT (exact KNN):

$$\text{Query} = \mathcal{O}(N \cdot d), \quad \text{Build} = \mathcal{O}(N)$$

Fine for small N (< 100K); linear cost at scale.

HNSW (Hierarchical Navigable Small World):

$$\text{Query} \approx \mathcal{O}(\log N), \quad \text{Build} = \mathcal{O}(N \cdot M \cdot \log N), \quad \text{Memory} \approx N \cdot d \cdot 4 + N \cdot M \cdot 8$$

\(M\) = neighbours per node (typically 16–64). Achieves ≥ 95% recall at ms-scale latency.

IVF (Inverted File Index):

$$\text{Query} \approx \mathcal{O}(n_{\text{probe}} \cdot \frac{N}{n_{\text{list}}}), \quad \text{Build} = \mathcal{O}(N \cdot n_{\text{list}})$$

\(n_{\text{list}}\) = number of clusters, \(n_{\text{probe}}\) = clusters scanned per query. Disk-friendly, strong at scale (>100M).

Choice table:

Index Fit N Recall Latency Memory
FLAT < 100K 100% Slow Low
HNSW 100K – 100M 95–99% Fast High (in-memory)
IVF + PQ > 100M 90–95% Medium Low (compressed)
DiskANN > 1B 90–97% Medium Low (SSD-based)

Most RAG indices are < 100M chunksHNSW is the default.


6. Walkthrough — Same Query Across Seven DBs

6.1 Chroma (local prototyping)

import chromadb
client = chromadb.PersistentClient(path="./chroma_db")
col = client.get_or_create_collection("rag", metadata={"hnsw:space": "cosine"})

col.add(
    ids=["c001", "c002"],
    embeddings=[emb1, emb2],
    metadatas=[{"version": "3.2", "security_level": "internal"}, ...],
    documents=[chunk1, chunk2],
)

results = col.query(
    query_embeddings=[query_emb],
    n_results=5,
    where={"security_level": {"$in": ["public", "internal"]}, "version": "3.2"},
)

Pros: instant start. Cons: no distribution, latency degrades past 1M.

6.2 Qdrant (Rust server, top payload filters)

from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, Filter, FieldCondition, MatchValue

client = QdrantClient("localhost", port=6333)

client.upsert(collection_name="rag", points=[
    PointStruct(id=1, vector=emb1, payload={"version": "3.2", "security_level": "internal"}),
])

hits = client.search(
    collection_name="rag",
    query_vector=query_emb,
    query_filter=Filter(must=[
        FieldCondition(key="version", match=MatchValue(value="3.2")),
        FieldCondition(key="security_level", match=MatchValue(value="internal")),
    ]),
    limit=5,
)

Pros: Filtered HNSW — pre-filter operates together with ANN search. Range, nested, geo all supported.

6.3 Pinecone (SaaS, namespace split)

from pinecone import Pinecone

pc = Pinecone(api_key=API_KEY)
index = pc.Index("rag")

index.upsert(
    vectors=[("c001", emb1, {"version": "3.2", "security_level": "internal"})],
    namespace="acme",     # tenant_id as namespace → strong isolation
)

results = index.query(
    namespace="acme",
    vector=query_emb,
    top_k=5,
    filter={"security_level": {"$in": ["public", "internal"]}, "version": "3.2"},
    include_metadata=True,
)

Pros: managed instantly; namespaces for tenant isolation. Cons: cost, vendor lock-in.

6.4 pgvector (Postgres extension)

CREATE EXTENSION vector;

CREATE TABLE rag_chunks (
    id TEXT PRIMARY KEY,
    embedding vector(1024),
    version TEXT,
    security_level TEXT,
    document_id TEXT,
    chunk_text TEXT
);

CREATE INDEX ON rag_chunks USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON rag_chunks (security_level, version);

SELECT id, document_id, chunk_text,
       1 - (embedding <=> query_emb) AS similarity
FROM rag_chunks
WHERE security_level IN ('public', 'internal')
  AND version = '3.2'
ORDER BY embedding <=> query_emb
LIMIT 5;

Pros: full SQL — JOIN, transactions, backups, permissions — all existing Postgres. Cons: HNSW memory pressure at scale; tuning effort.


7. Variants

7.1 Filtered HNSW vs post-filter HNSW

  • What changes: apply filters during ANN traversal (Qdrant, Weaviate) vs apply filters after ANN (FAISS, some Chroma setups).
  • Why use it: with low selectivity (permissions), post-filter empties candidates. Filtered HNSW stays aware of the full search space and remains stable.
  • What becomes possible: permission + RAG with empty rate near zero.
  • Where it fits: multi-tenant, strict permissions, 4+ security tiers.
  • Limits: heavier index build. Chroma lacks it; FAISS requires an external ID list.

7.2 IVF + PQ — compressed index for huge corpora

  • What changes: PQ (Product Quantization) compresses vectors to 8–16 bytes.
  • Why use it: 1B × 1024d ≈ 4 TB (float32) → 16 GB compressed; memory-fit.
  • What becomes possible: very large search at sane cost.
  • Where it fits: Milvus IVF_PQ, FAISS IndexIVFPQ. > 100M chunks.
  • Limits: 5–10 pp recall loss. Compensate with a reranker (Part 13).

7.3 DiskANN — SSD-resident ANN

  • What changes: index lives on SSD; traversal hits storage.
  • Why use it: side-steps memory limits with SSD.
  • What becomes possible: 1B+ chunks on a single node.
  • Where it fits: Milvus DiskANN, OpenSearch DiskANN.
  • Limits: SSD I/O-bound; NVMe required, HDD impossible.

7.4 pgvector + existing RDB integration

  • What changes: vector results JOIN directly with source tables via SQL.
  • Why use it: the most natural way to bolt RAG onto an existing OLTP system.
  • What becomes possible: transactions, backups, permissions, monitoring — all Postgres standard.
  • Where it fits: small scale (< 10M), existing Postgres ops.
  • Limits: HNSW memory pressure; a separate read replica is recommended.

7.5 Hybrid serving — Pinecone + local evaluation

  • What changes: production on Pinecone, evaluation/experimentation on local Chroma or Qdrant.
  • Why use it: managed cost only where it matters.
  • What becomes possible: fast experimentation alongside stable production.
  • Where it fits: startups to mid-size.
  • Limits: keeping embeddings and chunks in sync across environments.

8. Limits and Failure Modes

8.1 Persistence missing — the FAISS trap

  • Why intrinsic: FAISS is a library — without pickle save, the index vanishes on process exit. Part 7 metadata also lives separately.
  • Diagnosis: lifting a demo script into production loses the index on first restart.
  • Mitigation: FAISS for prototypes only; Chroma or higher in production.
  • Later part: Part 16 (experiment automation — FAISS for in-memory eval).

8.2 Missing update/delete — index rot

  • Why intrinsic: forgetting to delete old chunks on document update leaves old + new co-indexed — same as Part 7 §8.4 (version inconsistency).
  • Diagnosis: same document_id with different version appearing together in top-K.
  • Mitigation: atomic transactions (pgvector, Qdrant) or staged blue-green indices.
  • Later part: Part 22 (RAG operations).

8.3 Cardinality blow-up — metadata index bloat

  • Why intrinsic: indexing all high-cardinality fields makes the metadata index larger than the vectors.
  • Diagnosis: abnormal build time, disk surge.
  • Mitigation: choose indexed fields declaratively. Qdrant and Pinecone require explicit field indexing.
  • Later part: Part 16 (cardinality monitoring).

8.4 Distributed-ops burden — the Milvus/Weaviate trap

  • Why intrinsic: Milvus runs etcd + pulsar + MinIO + 5+ services. Without an ops team, maintenance itself is a burden.
  • Diagnosis: small team self-hosting Milvus → higher incident rate and recovery time.
  • Mitigation: under 1B chunks and SaaS allowed → Milvus Cloud or Pinecone; self-host only with dedicated ops.
  • Later part: Part 22 (RAG ops cost).

8.5 Vendor lock-in — the Pinecone trap

  • Why intrinsic: Pinecone's proprietary index format is hard to export. Migration = full re-index.
  • Diagnosis: months-long migration when cost rises.
  • Mitigation: back up raw embeddings in object storage — re-use them at migration to skip re-embedding.
  • Later part: Part 16 (backup patterns).

8.5 Common Pitfalls

  • "FAISS is fast, so production = FAISS." §8.1. No persistence, no filters, no ops.
  • "Metadata filters are the same everywhere." Expressiveness varies from 1 to 3; verify Part 7's design.
  • "Drop it into Pinecone and forget it." §8.5. Raw-embedding backups are mandatory.
  • "Distributed = safer." §8.4. Without ops staff, distributed is less safe.
  • "HNSW defaults are enough." \(M\), ef_construction, ef_search decide recall and latency.

9. Settled Conclusions

Q1. Which of the seven is fully managed and immediate?

Pinecone — minutes to set up, zero infra to run. Chapter: §4, §7.5.

Q2. Which self-host DB has the strongest payload-filter expressiveness?

Qdrant — Rust-based, Filtered HNSW couples pre-filter with ANN, supports range and nested. Chapter: §4, §6.2.

Q3. State the HNSW vs IVF choice rule in one line.

\(N < 100M\) → HNSW (in-memory, fast); \(N > 100M\) → IVF+PQ or DiskANN (compressed / disk). Chapter: §5.

Q4. Why is Filtered HNSW decisive for permission filtering?

Post-filtering can leave the candidate set empty when all are out-of-bounds. Filtered HNSW applies filters during traversal, holding empty-rate near zero. Chapter: §7.1, Part 7 §8.1.

Q5. The simplest mitigation for vendor lock-in?

Back up raw embedding vectors in object storage; re-index by re-using them with the new DB. Chapter: §8.5.


10. Further Reading

Primary

  • Malkov, Y., Yashunin, D. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. TPAMI 2018. arXiv:1603.09320.
  • JĆ©gou, H. et al. Product Quantization for Nearest Neighbor Search. TPAMI 2011 (IVF+PQ foundation).
  • Subramanya, S. J. et al. DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. NeurIPS 2019.
  • Pinecone. HNSW Internals blog series (2024).
  • Qdrant. Filtered HNSW: How payload filters integrate with ANN search (2023 blog).

Official docs

  • FAISS: https://github.com/facebookresearch/faiss/wiki
  • Chroma: https://docs.trychroma.com/
  • Qdrant: https://qdrant.tech/documentation/
  • Milvus: https://milvus.io/docs
  • Weaviate: https://weaviate.io/developers/weaviate
  • Pinecone: https://docs.pinecone.io/
  • pgvector: https://github.com/pgvector/pgvector

Supporting

  • Author note Chapter 8 — vector DBs.
  • Author note Chapter 35 §6 — Index Partitioning, Multi-collection.

Cheat Sheet

Scenario First choice Second Note
Managed, immediate Pinecone Weaviate Cloud namespace tenant split
Self-host, feature-rich Qdrant Weaviate Filtered HNSW
Large-scale distributed Milvus Vespa requires ops team
Existing Postgres integration pgvector < 10M chunks
Local prototyping Chroma FAISS migrate before prod
RBAC + GraphQL Weaviate modular ML
Embedded SDK Chroma FAISS library form

Design rule of thumb: ops model (managed vs self-host) → scale → filter expressiveness → integration environment — narrow in that order.


Bridge — What's Next

Next — RAG Core Study (10/26) — Dense Retrieval Deep Dive.

With the vector DB chosen, the next question is what kind of retrieval runs inside it. Part 10 covers dense retrieval — bi-encoder, DPR (Karpukhin 2020), query–document asymmetry, top-K, and similarity functions — with formulas and code.

Series overview: Series index

ėŒ“źø€

ģ“ ėø”ė”œź·øģ˜ ģøźø° ź²Œģ‹œė¬¼

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System