Agent Self-Improvement Harness (10/12) — Session Archive: Preserving Knowledge from Deleted Sessions

4월 05, 2026

How session-archive.py extracts only what matters from vanishing context

Summary

AI agent sessions are deleted or reset, but the knowledge generated within them does not have to be discarded
session-archive.py automatically extracts solutions, decision rationale, and recurring patterns from .deleted and .reset sessions
Full code and tool output are excluded — both are recoverable from git or by re-execution

Problem Definition

In production AI agent operations, sessions disappear frequently. Context window overflow triggers a reset. A session is manually cleaned up. An error causes an abrupt stop. The cause varies; the result is the same — session contents are gone.

The issue is that sessions contain knowledge that is expensive to reconstruct: "this bug was fixed this way," "the design was changed for this reason," "this pattern is recurring." When a session disappears, the next session rediscovers all of it from scratch.

After encountering this repeatedly in OpenClaw, session-archive.py was designed to address it. OpenClaw is currently in stable operation; the Hermes migration is in a separate validation phase.

Body

1. What to Extract and What Not to Extract

The design of session-archive.py is defined less by what to store and more by what not to store.

Extracted: - Solutions — how a specific problem was resolved. High cost to reproduce. - Decision rationale — why B was chosen over A. Cannot be reconstructed once context is lost. - Recurring patterns — problem types likely to surface again.

Not extracted: - Full source code — exists in git. Recoverable via commit log and diff. - Tool output — re-executable. Storing it wastes storage. - The full trial-and-error process — storing every failed attempt before reaching the final solution produces noise. Only the conclusion is retained.

The governing principle: "Is this recoverable elsewhere?" If recoverable from git, do not store. If recoverable by re-execution, do not store. Store only what becomes unrecoverable when the session disappears.

2. Target Sessions: .deleted and .reset

session-archive.py does not process all sessions. Active sessions are left untouched. Only already-terminated sessions are processed.

Two session states are targeted:

.deleted — sessions the user explicitly deleted, typically for cleanup purposes
.reset — sessions terminated involuntarily due to context overflow or similar conditions

In both cases, session contents are either already gone or scheduled to disappear. The archive extracts the essential content before that happens.

3. staging/session-archive.json Schema

Extracted entries are written to staging/session-archive.json. The staging step exists because extracted content is not immediately promoted to the final knowledge store — it requires validation first.

{
  "archived_at": "2026-04-06T14:30:00Z",
  "source_session": "session_id",
  "source_status": "deleted",
  "entries": [
    {
      "type": "solution",
      "summary": "Resolved lock contention during BM25 index update",
      "detail": "Write to a tmp file first, then perform an atomic rename before committing to the target path",
      "tags": ["bm25", "file-lock", "concurrency"]
    },
    {
      "type": "decision",
      "summary": "Why TTL was set to 30 minutes",
      "detail": "5 minutes caused excessive I/O from frequent updates; 1 hour risked serving stale data",
      "tags": ["cache", "ttl", "performance"]
    }
  ]
}

Each entry consists of type (solution / decision / pattern), summary (single-line description), detail (concrete content), and tags (search keys). While in staging, entries are unvalidated. A separate process verifies each entry and moves approved ones to the final knowledge store.

4. Retention Policy and Recovery Flow

The retention policy for archived entries is straightforward: staging validation passes → promote to final store → index. Entries that fail validation or are flagged as duplicates are discarded.

The recovery flow:

On new session start, query the final store by relevant tags
Inject matched entries into context
Agent references prior decisions and solutions to proceed

This flow maintains knowledge continuity across sessions. Even after a reset, solutions validated in a previous session are immediately available to the next one.

5. "Sessions Disappear. Knowledge Doesn't Have To."

The core premise of this design is that sessions and knowledge are distinct things.

A session is ephemeral. It is a temporary execution environment for a specific task. It naturally terminates when the task is done. Permanently preserving every session is expensive and impractical.

Knowledge generated within a session is different. "This problem is solved this way" is reusable information. Discarding it along with the session is waste.

session-archive.py separates the two. Sessions are allowed to disappear naturally. Only the knowledge is extracted.

Engineering Decisions

Initial approach: store full session contents → Storage grew quickly. The majority of stored content was tool call logs and code output. The actually useful "why was this decision made" accounted for less than 5% of the total.
Extraction criteria started as keyword matching → Sentences containing "solution," "decision," "reason" were extracted. Precision was low. Sentences like "attempted to resolve this issue but failed" also matched "solution."
Attempted direct promotion to final store without staging → Incorrectly extracted entries became entrenched as knowledge. An intermediate validation step was determined to be necessary.

Each decision concretized a design principle: the criteria for what not to store, the introduction of staging, tag-based indexing — all of these emerged from problems encountered in actual operation.

Closing

Operating AI agents leads to a recurring situation: solving the same problem from scratch again. It was clearly resolved in a previous session, but the session is gone, so the next session has no record of it.

session-archive.py reduces this discontinuity. It does not preserve full sessions — it extracts only the essential knowledge produced within them. Code belongs in git. Tool output can be re-run. What exists only in the session — why a decision was made, what approach solved the problem — that is what gets preserved.

Sessions can disappear. The knowledge just needs to remain.

Series overview: Series index

이 블로그 검색

MaJu Tech Notes