"The Real Failure Mode of AI Research Agents — They Don't Get It Wrong, They Just Don't Finish"

모델 라우팅 전략 — haiku/sonnet/opus를 용도별로 분배하는 이유

Premature return and how completion contracts fix it


핵심 요약

  • The dominant failure mode of AI agents in production isn't hallucination — it's premature return: delivering results before the task is actually complete
  • Root cause: the agent was never given explicit completion criteria
  • Fixed with a two-stage verification flow (collect + verify) and an explicit completion contract
1. 3단계 모델 라우팅 테이블

Background

An orchestrator agent requested a product research task from a researcher sub-agent: find candidate air conditioner models matching specific capacity, price range, and installation constraints, then collect official specs.

What came back was an empty result padded with a lengthy "next steps" list. On the surface it looked like a successful response. In reality, nothing had been completed.

2. 에이전트 역할별 모델 할당

The Core Insight: Premature Return

When people think of LLM agent failures, hallucination comes to mind first. But in actual operations, the more frequent failure is premature return — the agent decides the task is done before the requirements are met and hands back results.

What makes it worse: the agent packages "plans for future work" as if they were actual research findings.

Root Cause: No Completion Criteria

The system provided role definitions, constraints, and tool usage instructions — but never specified what state constitutes "done." Without a completion contract, the agent had no way to distinguish between a valid output and an incomplete one.

The Fix: Completion Contracts

Explicitly define what "not done" looks like: - Output is empty or contains only placeholders - Output is a research plan or TODO list - Claims lack source URLs - Unverified items are not explicitly marked as such

Enforce a two-stage verification flow:

Stage 1 (Candidate Collection): Multi-keyword search, official document filtering, minimum 3 candidates secured

Stage 2 (Verification & Synthesis): Official spec sheet confirmation, cross-verification from at least 2 sources, classification as [verified / unverified / no data available]

Results

Metric Before (Failure) After (Success)
Output content No substance + future plans Concrete model list + URLs + spec table
Official docs None referenced Direct verification from official sites
Unverified handling Silently omitted Explicitly labeled

Pitfalls and Caveats

  • If you define entry conditions, you must also define exit conditions.
  • Enumerating failure modes upfront teaches the agent the boundary between "good output" and "bad output."
  • A two-stage flow (collect + verify) is more reliable than a single pass.

Takeaway

We're good at telling agents what to do, but bad at telling them when they're done. Simply making the completion contract explicit dramatically improved the agent's real-world success rate.

댓글

이 블로그의 인기 게시물

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System

"ML Foundations (6/9) — Neural Networks: From Perceptron to MLP"