"RAG Core Study (21/26) — Adaptive Top-K and Conditional Reranking"

Giving 20 documents to an easy question is wasteful. Giving 3 documents to a hard question is also wasteful.

Adaptive Top-K means retrieval depth changes by query rather than staying fixed. Conditional Reranking means the reranker only runs when the candidate set is ambiguous enough to justify the extra cost. Together, they form a practical answer to one recurring production problem: not every question deserves the same retrieval budget.


0. Prerequisites

  • Part 13 rerankers
  • Part 20 dynamic weighting
  • Part 16 experiment automation

1. Learning Objectives

  1. Explain why fixed top-K is often too blunt for production RAG.
  2. Understand how adaptive top-K changes retrieval depth by query difficulty.
  3. Distinguish adaptive depth from conditional reranking.
  4. Relate these choices to token budget and latency.

2. ํ•ต์‹ฌ ์š”์•ฝ

If every query uses the same top-K, the system over-retrieves for easy questions and under-retrieves for hard ones. Adaptive Top-K uses query type, score gap, and retrieval confidence to choose a deeper or shallower search. Conditional Reranking uses similar signals to decide whether reranking is worth the cost. These are not only cost-saving tricks. They are question-dependent retrieval policies.


3. Intuition — Easy and Hard Questions Need Different Depth

  • Easy question: “What does PR-2024-Q3 conclude?”
    A strong exact match may make top-3 enough.

  • Hard question: “How did the external-sharing exception change legal review?”
    This may require multiple supporting documents and deeper retrieval.

A fixed K cannot express that difference well.


4. Definitions — Core Terms of Depth Control

Term Definition
Adaptive Top-K Choose retrieval depth per query instead of globally
Conditional Reranking Run the reranker only when ambiguity is high enough
Token Budget The total context allowance available to generation
Confidence Threshold A cutoff that decides whether to retrieve or rerank more

5. Mechanism — When to Search Deeper and When to Stop

Typical signals include:

  1. query type complexity
  2. top-1 vs top-2 score gap
  3. Dense/Sparse agreement
  4. total context cost relative to token budget

These signals let the pipeline choose not just what to retrieve, but how far to go.


6. Walkthrough — A Small Adaptive Policy

6.1 Choosing top-K

def choose_top_k(query_type, confidence):
    if query_type == "comparison":
        return 12
    if confidence > 0.8:
        return 4
    if confidence < 0.4:
        return 15
    return 8

6.2 Conditional reranking

def should_rerank(confidence, candidate_count):
    return confidence < 0.75 and candidate_count >= 5

6.3 Linking to token budget

if total_context_tokens > 3000:
    top_k = min(top_k, 5)

Self-explanation: Why is Adaptive Top-K more than a simple cost-optimisation switch?


7. Variants and Use Cases

7.1 Confidence-based Top-K

What changes
Retrieval goes deeper only when the current result looks uncertain.

Why it matters
It avoids spending the same retrieval depth on clearly resolved and unresolved queries.

What it enables
You can preserve quality while shrinking average retrieval cost.

Limit and next step
If confidence estimation is weak, the policy can become unreliable.

7.2 Query-type-based Top-K

Procedure and comparison questions often need deeper context than identifier-heavy lookup questions.

7.3 Conditional reranking

When the candidate set is already obvious, reranking may add cost without meaningful gain. When ambiguity is high, reranking often matters a lot.


8. Limits and Failure Modes

8.1 Bad confidence estimates create bad depth decisions

False confidence can cause the system to stop too early.

8.2 Lower K is not always lower latency overall

If reduced depth causes repeated corrective retrieval later, total latency may increase instead of decrease.

8.3 Over-optimising for token budget can weaken evidence quality

An aggressive depth cap can cut away necessary supporting documents.

8.4 Next step — Adaptive depth naturally raises a new question

If the system can retrieve more or less depending on uncertainty, then it also needs a better notion of search confidence itself. That is Part 22.


8.5 Common Pitfalls

# Pitfall Symptom Fast Check
1 one fixed K for everything mixed under/over-retrieval evaluate by query type
2 reranker always on unnecessary latency gate reranking by ambiguity
3 ignoring token budget bloated contexts track context size explicitly
4 unclear confidence threshold unstable behaviour document thresholds and rationale
5 no online validation rollout surprises shadow-test adaptive policy first

9. Self-check — Answer Before Looking

Q1. What is the purpose of Adaptive Top-K?

Answer To match retrieval depth to question difficulty and uncertainty.
Why Not all questions require the same amount of supporting context.

Q2. What is Conditional Reranking?

Answer Running the reranker only when the candidate set is ambiguous enough to justify it.
Why Reranking costs time and does not always improve already obvious cases.

Q3. Why can a smaller K still backfire?

Answer Because shallow retrieval may trigger later corrective searches or weak grounding.
Why Savings in one stage can create losses in another.


Cheat Sheet — One-page Summary

Definitions - Adaptive Top-K: query-dependent retrieval depth - Conditional Reranking: ambiguity-triggered rerank step - Token Budget: maximum usable context window for generation

Minimal code

top_k = choose_top_k(query_type, confidence)

When to use what | Situation | Policy | |---|---| | clear exact-match query | smaller K | | comparison/procedure query | larger K | | uncertain candidate ranking | rerank |


References

Supporting notes

  • User notes, chapter 18 adaptive top-k

Bridge to the Next Part

Adaptive retrieval depth makes confidence central. Part 22 focuses directly on search confidence and corrective RAG.

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System