Agent Self-Improvement Harness (11/12) — U-tag Dialectical Modeling: Observe-Hypothesize-Verify

A 3-stage user tagging system that blocks confirmation bias and detects "declared value ≠ enacted value" divergence


Summary

  • Every user judgment passes through three validation stages: observation → hypothesis → verified
  • The contradictions field tracks disconfirming evidence, structurally preventing confirmation bias
  • Detects the pattern where a user states "growth is my top priority" while consistently choosing stability in practice

Background

When an AI agent claims to "understand" a user, the implementation usually stops at profile storage. A tag is attached — "this user prefers Python", "this user wants concise answers" — and that is the end of it.

The problem: those tags are never validated. A single observation gets frozen as permanent fact. Users change. Initial observations can be wrong. Tags remain.

The deeper problem is confirmation bias. Once a tag like "this user prefers data-driven decisions" is in place, the agent selectively perceives evidence that confirms it. Decisions made on intuition are discarded or marked as exceptions.

The U-tag (User Tag) system implemented in OpenClaw addresses this with a dialectical approach. The core design principle: every judgment must carry built-in room for refutation.


Body

1. U-tag Schema

U-tag is not a simple key-value store. Each tag carries a status, an evidence list, and a contradictions list.

{
  "tag": "prefers_data_over_intuition",
  "status": "observation | hypothesis | verified",
  "source": "observed | declared",
  "evidence": [],
  "contradictions": [],
  "verified_at": null,
  "last_updated": null
}

The source field distinguishes tag origin. observed means the tag was derived from behavioral observation. declared means it was derived from something the user explicitly stated. This distinction is essential when cross-comparing declared values against enacted values.

2. 3-Stage Lifecycle — observation, hypothesis, verified

Every user judgment passes through three stages.

Observation — single instance. One observed behavior. Not yet a judgment.

{
  "tag": "prefers_data_over_intuition",
  "status": "observation",
  "evidence": [
    { "stage": "initial", "event": "Requested quantitative analysis for investment decision" }
  ]
}

At this stage, the agent does not use the tag in response generation. It records only. The first principle: a single observation does not justify a judgment.

Hypothesis — repeated pattern confirmed. When the same pattern appears two or more times, the tag is promoted to hypothesis.

{
  "tag": "prefers_data_over_intuition",
  "status": "hypothesis",
  "evidence": [
    { "stage": "initial", "event": "Requested quantitative analysis for investment decision" },
    { "stage": "follow_up", "event": "Requested benchmark comparison for technology selection" }
  ]
}

At the hypothesis stage, the agent uses the tag as a reference signal — slightly prioritizing data-inclusive responses — but does not treat it as definitive.

Verified — validation complete. A hypothesis that holds over a sustained period without contradictions is promoted to verified.

{
  "tag": "prefers_data_over_intuition",
  "status": "verified",
  "evidence": [...],
  "contradictions": [],
  "verified_at": "verified_stage"
}

Only verified tags are actively used by the agent. Data-grounded responses become the default; responses based on intuition alone are deprioritized.

3. The contradictions Field — Tracking Disconfirming Evidence and Automatic Demotion

The core innovation in U-tag is the contradictions field. Every tag carries a built-in disconfirmation record.

Consider a verified tag for "prefers data over intuition". If the user makes a significant decision based on intuition without data, this is recorded as follows:

{
  "tag": "prefers_data_over_intuition",
  "status": "verified",
  "contradictions": [
    {
      "stage": "review",
      "event": "Chose technology stack based on intuition, no benchmark consulted",
      "severity": "medium"
    }
  ]
}

When a contradiction is recorded, automatic status demotion occurs based on severity:

severity Action
low Record only. Status unchanged.
medium Demote verified → hypothesis. Additional observation required.
high Demote hypothesis → observation. Effective reset.

Severity criteria: - low — Deviation on a minor decision (e.g., using intuition for a peripheral choice) - medium — Deviation on a significant decision (e.g., choosing a technology stack without data) - high — Explicit statement or action that negates the core premise of the tag

This structure blocks confirmation bias. The agent does not collect only evidence that confirms a tag — it systematically collects evidence that denies it as well.

4. Detecting "Declared Value ≠ Enacted Value"

The most powerful application of U-tag is detecting divergence between what users say they value and what their behavior reveals.

People frequently act differently from how they describe themselves. A user states "growth is my top priority" but consistently chooses stability in actual decisions. A user states "efficiency matters most" but invests time pursuing perfection.

When contradictions accumulate on source: "declared" tags, this divergence becomes visible.

{
  "tag": "prioritizes_growth_over_stability",
  "status": "verified",
  "source": "declared",
  "evidence": [
    { "stage": "initial", "event": "Explicitly stated 'growth is my top priority'" }
  ],
  "contradictions": [
    {
      "stage": "decision_a",
      "event": "Given a new job opportunity vs. current job stability — chose stability",
      "severity": "high"
    },
    {
      "stage": "decision_b",
      "event": "Given new technology adoption vs. proven technology — chose proven",
      "severity": "medium"
    }
  ]
}

As contradictions accumulate, the agent recognizes the divergence. Critically, the agent does not judge. It does not say "you appear to prefer stability over growth."

Instead, the agent presents both options in parallel — a growth-oriented response alongside a stability-oriented response — and lets the user choose. This is the operational output of confirmation bias prevention.

5. Query Patterns — Tag Retrieval and Application

How the agent queries U-tags during response generation is part of the design.

def get_active_tags(user_id: str, min_status: str = "hypothesis") -> list[dict]:
    """
    Returns only tags at or above min_status.
    observation tags are internal tracking only — not used in response generation.
    """
    tags = load_utags(user_id)
    status_rank = {"observation": 0, "hypothesis": 1, "verified": 2}
    threshold = status_rank[min_status]
    return [t for t in tags if status_rank[t["status"]] >= threshold]


def get_conflicted_tags(user_id: str) -> list[dict]:
    """
    Returns declared tags that carry contradictions.
    Used to detect declared value ≠ enacted value divergence.
    """
    tags = load_utags(user_id)
    return [
        t for t in tags
        if t.get("source") == "declared" and len(t.get("contradictions", [])) > 0
    ]

get_active_tags returns the trustworthy tag set referenced during response generation. get_conflicted_tags identifies domains where the user's statements and behavior diverge, flagging situations that require presenting multiple perspectives.


Design Decisions and Known Limitations

Why the hypothesis stage is necessary. The initial design used a 2-stage structure: promote to verified on two observations. The problem: two consecutive observations do not establish a fundamental tendency. After introducing the intermediate hypothesis stage with a minimum hold period, the rate of incorrectly promoted tags dropped measurably.

Limits of automatic contradiction detection. Detecting every contradiction automatically is not tractable in the current architecture. Whether a given behavior contradicts a specific tag depends on context. The current implementation records contradictions only when a behavior explicitly conflicts with a tag. Subtle contradictions can be missed — this is a known limitation.

Does the agent judge the user? U-tag is not an evaluation tool. It is an adaptation mechanism for improving response quality. When contradictions accumulate, the agent does not surface a verdict. It expands the perspective range it presents.


Closing

Core design principles of U-tag dialectical modeling:

  1. 3-stage lifecycle. Do not judge from a single observation. observation → hypothesis → verified.
  2. Built-in disconfirmation. The contradictions field systematically collects evidence that negates each tag.
  3. Declared ≠ enacted detection. The source field separates statement-derived tags from behavior-derived tags, enabling divergence detection.
  4. Output is options, not verdicts. The output of confirmation bias prevention is not a correction — it is parallel presentation of multiple perspectives.

Understanding a user is not fixing a profile. It is observing, forming hypotheses, verifying, and tracking disconfirmation. The same method science uses to understand the world: never finalize, continuously validate.

Series overview: Series index

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System