"RAG Core Study (9/26) — Vector DB Showdown: FAISS / Chroma / Qdrant / Milvus / Weaviate / Pinecone / pgvector"

5월 17, 2026

With the embedding chosen, the next decision is where to store the vectors.

Vector-DB choice is hard to undo. Index formats, APIs, and operational models all differ; migration means re-indexing. Part 9 compares 2026's seven mainstream choices — FAISS / Chroma / Qdrant / Milvus / Weaviate / Pinecone / pgvector — on four axes: ANN algorithm (HNSW/IVF), metadata-filter expressiveness, update/delete support, and operational cost. The crucial question is how Part 7's metadata design meets each DB's actual support.

0. Prerequisites

Part 7 metadata — pre vs post filter, cardinality.
Part 8 embeddings — dimension affects storage and latency.
ANN (Approximate Nearest Neighbor) trades exact KNN for speed.

1. Learning Objectives

State the one-line difference between HNSW, IVF, and FLAT.
Read a table of when to pick which vector DB.
Understand how metadata-filter expressiveness meets Part 7's design.
Explain why update/delete is decisive in RAG operations.

2. 핵심 요약

A vector DB is ANN index + metadata filter + operational API. FAISS is a library — fast but not a DB. Chroma is the local-prototyping standard. Qdrant is the Rust-based self-host standard, with the strongest payload filters. Milvus is for large-scale distributed ops. Weaviate offers modular + GraphQL + RBAC. Pinecone is fully managed SaaS, the fastest setup. pgvector is a PostgreSQL extension, the most natural fit for existing DB shops. Three-line selection rule: managed + immediate → Pinecone, self-host + feature-rich → Qdrant or Milvus, existing Postgres + small scale → pgvector. HNSW is the default ANN; the decisive feature is whether the DB supports Filtered HNSW to integrate metadata filtering with search.

3. Intuition — Same Corpus, Seven Operations

The same 1M-chunk × 1024d corpus, indexed in seven tools, gives seven different setups for cost, filter support, and disaster recovery.

All seven can retrieve. The differences live in operations — who manages, how to back up and restore, and how expressive the filter language is.

4. Definitions — Seven Mainstream DBs (2026)

Filter expressiveness on a 0–3 scale — 0=none, 1=weak, 2=medium, 3=strong.

DB	Form	ANN	Metadata filter (0-3)	Update/delete	License	Operation
FAISS	Library	HNSW, IVF, PQ	0 (separate ID list)	❌ (rebuild)	MIT	Embedded in code
Chroma	Embedded DB	HNSW	1 (dict-based)	✅	Apache 2.0	Local / self-host
Qdrant	Server	HNSW (Filtered)	3 (payload, range/match)	✅	Apache 2.0	Self-host or cloud
Milvus	Server (distributed)	HNSW, IVF, DiskANN	2 (boolean, range)	✅	Apache 2.0	Self-host (etcd + pulsar)
Weaviate	Server	HNSW	3 (GraphQL, ABAC)	✅	BSD-3	Self-host or cloud
Pinecone	SaaS	proprietary (HNSW-like)	3 (mongo-style)	✅	Commercial	Fully managed
pgvector	Postgres extension	HNSW, IVF-flat	3 (full SQL WHERE)	✅ (SQL UPDATE/DELETE)	PostgreSQL	Added to existing Postgres

Decisive axes — how managed the operation is and how expressive the metadata filter language is. FAISS is a library, so it lacks DB-grade features.

5. Math — ANN Trade-offs

FLAT (exact KNN):

$$\text{Query} = \mathcal{O}(N \cdot d), \quad \text{Build} = \mathcal{O}(N)$$

Fine for small N (< 100K); linear cost at scale.

HNSW (Hierarchical Navigable Small World):

$$\text{Query} \approx \mathcal{O}(\log N), \quad \text{Build} = \mathcal{O}(N \cdot M \cdot \log N), \quad \text{Memory} \approx N \cdot d \cdot 4 + N \cdot M \cdot 8$$

$M$ = neighbours per node (typically 16–64). Achieves ≥ 95% recall at ms-scale latency.

IVF (Inverted File Index):

$$\text{Query} \approx \mathcal{O}(n_{\text{probe}} \cdot \frac{N}{n_{\text{list}}}), \quad \text{Build} = \mathcal{O}(N \cdot n_{\text{list}})$$

$n_{\text{list}}$ = number of clusters, $n_{\text{probe}}$ = clusters scanned per query. Disk-friendly, strong at scale (>100M).

Choice table:

Index	Fit N	Recall	Latency	Memory
FLAT	< 100K	100%	Slow	Low
HNSW	100K – 100M	95–99%	Fast	High (in-memory)
IVF + PQ	> 100M	90–95%	Medium	Low (compressed)
DiskANN	> 1B	90–97%	Medium	Low (SSD-based)

Most RAG indices are < 100M chunks → HNSW is the default.

6. Walkthrough — Same Query Across Seven DBs

6.1 Chroma (local prototyping)

import chromadb
client = chromadb.PersistentClient(path="./chroma_db")
col = client.get_or_create_collection("rag", metadata={"hnsw:space": "cosine"})

col.add(
    ids=["c001", "c002"],
    embeddings=[emb1, emb2],
    metadatas=[{"version": "3.2", "security_level": "internal"}, ...],
    documents=[chunk1, chunk2],
)

results = col.query(
    query_embeddings=[query_emb],
    n_results=5,
    where={"security_level": {"$in": ["public", "internal"]}, "version": "3.2"},
)

Pros: instant start. Cons: no distribution, latency degrades past 1M.

6.2 Qdrant (Rust server, top payload filters)

from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, Filter, FieldCondition, MatchValue

client = QdrantClient("localhost", port=6333)

client.upsert(collection_name="rag", points=[
    PointStruct(id=1, vector=emb1, payload={"version": "3.2", "security_level": "internal"}),
])

hits = client.search(
    collection_name="rag",
    query_vector=query_emb,
    query_filter=Filter(must=[
        FieldCondition(key="version", match=MatchValue(value="3.2")),
        FieldCondition(key="security_level", match=MatchValue(value="internal")),
    ]),
    limit=5,
)

Pros: Filtered HNSW — pre-filter operates together with ANN search. Range, nested, geo all supported.

6.3 Pinecone (SaaS, namespace split)

from pinecone import Pinecone

pc = Pinecone(api_key=API_KEY)
index = pc.Index("rag")

index.upsert(
    vectors=[("c001", emb1, {"version": "3.2", "security_level": "internal"})],
    namespace="acme",     # tenant_id as namespace → strong isolation
)

results = index.query(
    namespace="acme",
    vector=query_emb,
    top_k=5,
    filter={"security_level": {"$in": ["public", "internal"]}, "version": "3.2"},
    include_metadata=True,
)

Pros: managed instantly; namespaces for tenant isolation. Cons: cost, vendor lock-in.

6.4 pgvector (Postgres extension)

CREATE EXTENSION vector;

CREATE TABLE rag_chunks (
    id TEXT PRIMARY KEY,
    embedding vector(1024),
    version TEXT,
    security_level TEXT,
    document_id TEXT,
    chunk_text TEXT
);

CREATE INDEX ON rag_chunks USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON rag_chunks (security_level, version);

SELECT id, document_id, chunk_text,
       1 - (embedding <=> query_emb) AS similarity
FROM rag_chunks
WHERE security_level IN ('public', 'internal')
  AND version = '3.2'
ORDER BY embedding <=> query_emb
LIMIT 5;

Pros: full SQL — JOIN, transactions, backups, permissions — all existing Postgres. Cons: HNSW memory pressure at scale; tuning effort.

7. Variants

7.1 Filtered HNSW vs post-filter HNSW

What changes: apply filters during ANN traversal (Qdrant, Weaviate) vs apply filters after ANN (FAISS, some Chroma setups).
Why use it: with low selectivity (permissions), post-filter empties candidates. Filtered HNSW stays aware of the full search space and remains stable.
What becomes possible: permission + RAG with empty rate near zero.
Where it fits: multi-tenant, strict permissions, 4+ security tiers.
Limits: heavier index build. Chroma lacks it; FAISS requires an external ID list.

7.2 IVF + PQ — compressed index for huge corpora

What changes: PQ (Product Quantization) compresses vectors to 8–16 bytes.
Why use it: 1B × 1024d ≈ 4 TB (float32) → 16 GB compressed; memory-fit.
What becomes possible: very large search at sane cost.
Where it fits: Milvus IVF_PQ, FAISS IndexIVFPQ. > 100M chunks.
Limits: 5–10 pp recall loss. Compensate with a reranker (Part 13).

7.3 DiskANN — SSD-resident ANN

What changes: index lives on SSD; traversal hits storage.
Why use it: side-steps memory limits with SSD.
What becomes possible: 1B+ chunks on a single node.
Where it fits: Milvus DiskANN, OpenSearch DiskANN.
Limits: SSD I/O-bound; NVMe required, HDD impossible.

7.4 pgvector + existing RDB integration

What changes: vector results JOIN directly with source tables via SQL.
Why use it: the most natural way to bolt RAG onto an existing OLTP system.
What becomes possible: transactions, backups, permissions, monitoring — all Postgres standard.
Where it fits: small scale (< 10M), existing Postgres ops.
Limits: HNSW memory pressure; a separate read replica is recommended.

7.5 Hybrid serving — Pinecone + local evaluation

What changes: production on Pinecone, evaluation/experimentation on local Chroma or Qdrant.
Why use it: managed cost only where it matters.
What becomes possible: fast experimentation alongside stable production.
Where it fits: startups to mid-size.
Limits: keeping embeddings and chunks in sync across environments.

8. Limits and Failure Modes

8.1 Persistence missing — the FAISS trap

Why intrinsic: FAISS is a library — without pickle save, the index vanishes on process exit. Part 7 metadata also lives separately.
Diagnosis: lifting a demo script into production loses the index on first restart.
Mitigation: FAISS for prototypes only; Chroma or higher in production.
Later part: Part 16 (experiment automation — FAISS for in-memory eval).

8.2 Missing update/delete — index rot

Why intrinsic: forgetting to delete old chunks on document update leaves old + new co-indexed — same as Part 7 §8.4 (version inconsistency).
Diagnosis: same document_id with different version appearing together in top-K.
Mitigation: atomic transactions (pgvector, Qdrant) or staged blue-green indices.
Later part: Part 22 (RAG operations).

8.3 Cardinality blow-up — metadata index bloat

Why intrinsic: indexing all high-cardinality fields makes the metadata index larger than the vectors.
Diagnosis: abnormal build time, disk surge.
Mitigation: choose indexed fields declaratively. Qdrant and Pinecone require explicit field indexing.
Later part: Part 16 (cardinality monitoring).

8.4 Distributed-ops burden — the Milvus/Weaviate trap

Why intrinsic: Milvus runs etcd + pulsar + MinIO + 5+ services. Without an ops team, maintenance itself is a burden.
Diagnosis: small team self-hosting Milvus → higher incident rate and recovery time.
Mitigation: under 1B chunks and SaaS allowed → Milvus Cloud or Pinecone; self-host only with dedicated ops.
Later part: Part 22 (RAG ops cost).

8.5 Vendor lock-in — the Pinecone trap

Why intrinsic: Pinecone's proprietary index format is hard to export. Migration = full re-index.
Diagnosis: months-long migration when cost rises.
Mitigation: back up raw embeddings in object storage — re-use them at migration to skip re-embedding.
Later part: Part 16 (backup patterns).

8.5 Common Pitfalls

"FAISS is fast, so production = FAISS." §8.1. No persistence, no filters, no ops.
"Metadata filters are the same everywhere." Expressiveness varies from 1 to 3; verify Part 7's design.
"Drop it into Pinecone and forget it." §8.5. Raw-embedding backups are mandatory.
"Distributed = safer." §8.4. Without ops staff, distributed is less safe.
"HNSW defaults are enough." $M$, ef_construction, ef_search decide recall and latency.

9. Settled Conclusions

Q1. Which of the seven is fully managed and immediate?

Pinecone — minutes to set up, zero infra to run. Chapter: §4, §7.5.

Q2. Which self-host DB has the strongest payload-filter expressiveness?

Qdrant — Rust-based, Filtered HNSW couples pre-filter with ANN, supports range and nested. Chapter: §4, §6.2.

Q3. State the HNSW vs IVF choice rule in one line.

$N < 100M$ → HNSW (in-memory, fast); $N > 100M$ → IVF+PQ or DiskANN (compressed / disk). Chapter: §5.

Q4. Why is Filtered HNSW decisive for permission filtering?

Post-filtering can leave the candidate set empty when all are out-of-bounds. Filtered HNSW applies filters during traversal, holding empty-rate near zero. Chapter: §7.1, Part 7 §8.1.

Q5. The simplest mitigation for vendor lock-in?

Back up raw embedding vectors in object storage; re-index by re-using them with the new DB. Chapter: §8.5.

10. Further Reading

Primary

Malkov, Y., Yashunin, D. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. TPAMI 2018. arXiv:1603.09320.
Jégou, H. et al. Product Quantization for Nearest Neighbor Search. TPAMI 2011 (IVF+PQ foundation).
Subramanya, S. J. et al. DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. NeurIPS 2019.
Pinecone. HNSW Internals blog series (2024).
Qdrant. Filtered HNSW: How payload filters integrate with ANN search (2023 blog).

Official docs

FAISS: https://github.com/facebookresearch/faiss/wiki
Chroma: https://docs.trychroma.com/
Qdrant: https://qdrant.tech/documentation/
Milvus: https://milvus.io/docs
Weaviate: https://weaviate.io/developers/weaviate
Pinecone: https://docs.pinecone.io/
pgvector: https://github.com/pgvector/pgvector

Supporting

Author note Chapter 8 — vector DBs.
Author note Chapter 35 §6 — Index Partitioning, Multi-collection.

Cheat Sheet

Scenario	First choice	Second	Note
Managed, immediate	Pinecone	Weaviate Cloud	namespace tenant split
Self-host, feature-rich	Qdrant	Weaviate	Filtered HNSW
Large-scale distributed	Milvus	Vespa	requires ops team
Existing Postgres integration	pgvector	—	< 10M chunks
Local prototyping	Chroma	FAISS	migrate before prod
RBAC + GraphQL	Weaviate	—	modular ML
Embedded SDK	Chroma	FAISS	library form

Design rule of thumb: ops model (managed vs self-host) → scale → filter expressiveness → integration environment — narrow in that order.

Bridge — What's Next

Next — RAG Core Study (10/26) — Dense Retrieval Deep Dive.

With the vector DB chosen, the next question is what kind of retrieval runs inside it. Part 10 covers dense retrieval — bi-encoder, DPR (Karpukhin 2020), query–document asymmetry, top-K, and similarity functions — with formulas and code.

Series overview: Series index