"RAG Core Study (21/26) — Adaptive Top-K and Conditional Reranking"
Giving 20 documents to an easy question is wasteful. Giving 3 documents to a hard question is also wasteful.
Adaptive Top-K means retrieval depth changes by query rather than staying fixed. Conditional Reranking means the reranker only runs when the candidate set is ambiguous enough to justify the extra cost. Together, they form a practical answer to one recurring production problem: not every question deserves the same retrieval budget.
0. Prerequisites
- Part 13 rerankers
- Part 20 dynamic weighting
- Part 16 experiment automation
1. Learning Objectives
- Explain why fixed top-K is often too blunt for production RAG.
- Understand how adaptive top-K changes retrieval depth by query difficulty.
- Distinguish adaptive depth from conditional reranking.
- Relate these choices to token budget and latency.
2. ํต์ฌ ์์ฝ
If every query uses the same top-K, the system over-retrieves for easy questions and under-retrieves for hard ones. Adaptive Top-K uses query type, score gap, and retrieval confidence to choose a deeper or shallower search. Conditional Reranking uses similar signals to decide whether reranking is worth the cost. These are not only cost-saving tricks. They are question-dependent retrieval policies.
3. Intuition — Easy and Hard Questions Need Different Depth
-
Easy question: “What does PR-2024-Q3 conclude?”
A strong exact match may make top-3 enough. -
Hard question: “How did the external-sharing exception change legal review?”
This may require multiple supporting documents and deeper retrieval.
A fixed K cannot express that difference well.
4. Definitions — Core Terms of Depth Control
| Term | Definition |
|---|---|
| Adaptive Top-K | Choose retrieval depth per query instead of globally |
| Conditional Reranking | Run the reranker only when ambiguity is high enough |
| Token Budget | The total context allowance available to generation |
| Confidence Threshold | A cutoff that decides whether to retrieve or rerank more |
5. Mechanism — When to Search Deeper and When to Stop
Typical signals include:
- query type complexity
- top-1 vs top-2 score gap
- Dense/Sparse agreement
- total context cost relative to token budget
These signals let the pipeline choose not just what to retrieve, but how far to go.
6. Walkthrough — A Small Adaptive Policy
6.1 Choosing top-K
def choose_top_k(query_type, confidence):
if query_type == "comparison":
return 12
if confidence > 0.8:
return 4
if confidence < 0.4:
return 15
return 8
6.2 Conditional reranking
def should_rerank(confidence, candidate_count):
return confidence < 0.75 and candidate_count >= 5
6.3 Linking to token budget
if total_context_tokens > 3000:
top_k = min(top_k, 5)
Self-explanation: Why is Adaptive Top-K more than a simple cost-optimisation switch?
7. Variants and Use Cases
7.1 Confidence-based Top-K
What changes
Retrieval goes deeper only when the current result looks uncertain.
Why it matters
It avoids spending the same retrieval depth on clearly resolved and unresolved queries.
What it enables
You can preserve quality while shrinking average retrieval cost.
Limit and next step
If confidence estimation is weak, the policy can become unreliable.
7.2 Query-type-based Top-K
Procedure and comparison questions often need deeper context than identifier-heavy lookup questions.
7.3 Conditional reranking
When the candidate set is already obvious, reranking may add cost without meaningful gain. When ambiguity is high, reranking often matters a lot.
8. Limits and Failure Modes
8.1 Bad confidence estimates create bad depth decisions
False confidence can cause the system to stop too early.
8.2 Lower K is not always lower latency overall
If reduced depth causes repeated corrective retrieval later, total latency may increase instead of decrease.
8.3 Over-optimising for token budget can weaken evidence quality
An aggressive depth cap can cut away necessary supporting documents.
8.4 Next step — Adaptive depth naturally raises a new question
If the system can retrieve more or less depending on uncertainty, then it also needs a better notion of search confidence itself. That is Part 22.
8.5 Common Pitfalls
| # | Pitfall | Symptom | Fast Check |
|---|---|---|---|
| 1 | one fixed K for everything | mixed under/over-retrieval | evaluate by query type |
| 2 | reranker always on | unnecessary latency | gate reranking by ambiguity |
| 3 | ignoring token budget | bloated contexts | track context size explicitly |
| 4 | unclear confidence threshold | unstable behaviour | document thresholds and rationale |
| 5 | no online validation | rollout surprises | shadow-test adaptive policy first |
9. Self-check — Answer Before Looking
Q1. What is the purpose of Adaptive Top-K?
Answer To match retrieval depth to question difficulty and uncertainty.
Why Not all questions require the same amount of supporting context.
Q2. What is Conditional Reranking?
Answer Running the reranker only when the candidate set is ambiguous enough to justify it.
Why Reranking costs time and does not always improve already obvious cases.
Q3. Why can a smaller K still backfire?
Answer Because shallow retrieval may trigger later corrective searches or weak grounding.
Why Savings in one stage can create losses in another.
Cheat Sheet — One-page Summary
Definitions - Adaptive Top-K: query-dependent retrieval depth - Conditional Reranking: ambiguity-triggered rerank step - Token Budget: maximum usable context window for generation
Minimal code
top_k = choose_top_k(query_type, confidence)
When to use what | Situation | Policy | |---|---| | clear exact-match query | smaller K | | comparison/procedure query | larger K | | uncertain candidate ranking | rerank |
References
Supporting notes
- User notes, chapter 18 adaptive top-k
Bridge to the Next Part
Adaptive retrieval depth makes confidence central. Part 22 focuses directly on search confidence and corrective RAG.
๋๊ธ
๋๊ธ ์ฐ๊ธฐ