"Tools & Sandboxing — Where the Agent Acts (Harness Series 4/6)"

The model reasons. The harness acts. A bad tool call can damage the user's system. This article covers the safety architecture around the action surface.

Series Roadmap (6 parts)

  1. What Is Harness Engineering?
  2. Context Engineering
  3. Memory Systems
  4. Tools & Sandboxing ← this article
  5. Multi-Provider Routing
  6. Evaluation & Ops

1. Tools = The Model's Hands

LLMs only reason. The hands — reading files, running commands, calling APIs — are separate components. Tools.

From the Claude Code analysis paper (arXiv 2604.14228): "When your agent reads a file, the harness decides whether the read is allowed, what happens to the result, and how much of the response fits in the next prompt. The model never touches the file system directly."

That's the foundational principle. The harness sits always between model and system.


2. Tool Specs — How a Model Calls Tools

Function Calling (OpenAI)

{
  "name": "read_file",
  "description": "Read a file from disk",
  "parameters": {
    "type": "object",
    "properties": {
      "path": {"type": "string"}
    }
  }
}

Tool Use (Anthropic)

Same structure, but responses include separate tool_use blocks. Used by Claude Code and Cursor.

MCP (Model Context Protocol)

  • Anthropic's protocol, released November 2024
  • A server exposes tools; any client can use them
  • Adopted by Claude Code, Cursor, and Codex CLI

Skills (CLAUDE.md pattern)

  • Encapsulates reusable procedures, not raw tools
  • Triggered by keywords
  • Used by Claude Code and OpenClaw

3. Permission Systems — What to Block

Claude Code: 7 permission modes + ML-based classifier (per arXiv 2604.14228)

Claude Code's 7-Mode System (2026)

  1. default — confirm every tool call
  2. acceptEdits — auto file edits, confirm rest
  3. bypassPermissions — auto everything (dangerous)
  4. plan — no tool use, planning only
  5. doNotAllow — explicit denylist
  6. allow — explicit allowlist
  7. dynamic — ML classifier decides safety

Permission Design Principles

  • Default deny, allow via whitelist
  • Directory-level isolation: /Users/foo/project/** allowed, ~/.ssh blocked
  • Bash commands matched by prefix: git status allowed, git push --force blocked

CLAUDE.md Permission Pattern

{
  "permissions": {
    "allow": ["Read(./**)", "Bash(git status)", "Bash(npm test:*)"],
    "deny": ["Read(~/.ssh/*)", "Bash(rm -rf:*)", "Bash(curl http*)"]
  }
}

4. Sandboxing — Levels of Isolation

4-1. Process Isolation

  • Tool runs in a separate process
  • The main agent process won't die
  • Lightweight, weak protection

4-2. Container (Docker)

  • Filesystem separation
  • Network policy possible
  • Memory/CPU limits
  • Per-tool spin-up is also viable

4-3. VM Isolation

Cursor 3 (April 2026): "Cursor Cloud Agents run autonomous coding tasks in isolated Linux VMs with full dev environments"

  • Full OS isolation
  • Strongest protection, heaviest
  • Becoming standard for cloud agents

4-4. Worktree (Git)

  • Same repo, separate branch directory
  • Main worktree untouched
  • Supported by Claude Code and Cursor 3

CLAUDE.md recommends: "execution: ... Use git worktree for risky changes."


5. Per-Tool Security Patterns

File Reads

  • Path validation (block ../)
  • Block symlink traversal
  • Size limits (large files chunked)

Bash Execution

  • Command allowlist
  • Pattern denylist (rm -rf /, fork bombs, sudo, etc.)
  • Timeout (default 2 minutes)
  • stdout/stderr size cap

Web Fetch / URL

  • Block localhost (SSRF prevention)
  • Block private IP ranges (10.x, 172.16.x, 192.168.x)
  • Force HTTPS
  • Redirect limit (loop prevention)

Database / API

  • Read-only credentials (when possible)
  • Rate limiting
  • Query timeout

6. MCP Security (April 2026)

MCP Servers Are External Code

  • Installing a third-party MCP server runs the server's code on your machine
  • Trust verification required
  • Prefer official servers (anthropic, openai, vercel, etc.)

MCP Token Limits (Claude Code)

  • Tool results capped at 25K tokens, warning at 10K
  • Server-opt-in up to 500K characters, persisted to disk
  • Prevents context blow-up

MCP Permission Delegation

When adding an MCP server, explicitly verify which tools it exposes. Don't auto-allow.


7. Hooks — Pre/Post Tool Guardrails

CLAUDE.md pattern: - PreToolUse(Bash|Write|Edit): input-stage guardrail. Block dangerous patterns + denied paths up front - PostToolUse(Edit|Write): detect debug statements + credential leaks

Claude Code hook system: 4 lifecycle events. Each runs user-defined shell commands.

Real Hook Examples

if [[ "$1" =~ "rm -rf" ]]; then
  echo "BLOCKED: rm -rf detected"
  exit 1
fi
if grep -q "API_KEY=" "$1"; then
  echo "WARNING: hardcoded credential"
fi

8. Isolation Level by Use Case

Use case Isolation Reason
Local personal project (single user) Process + permissions Light, sufficient
Team coding assistant (multi-user) Worktree + container Mistake isolation, fast
Cloud agent (untrusted code) VM Full isolation required
MCP server hosting Container per server Inter-server isolation
AutoML / Code Interpreter VM + network block Egress prevention

9. Failure Handling — Tool Loop Reliability

Tools fail. Network, permissions, missing file, timeout.

Failure Classification (CLAUDE.md docs/patterns/error-handling.md)

  • Transient (retry possible): transient network
  • Permanent (retry pointless): missing file, syntax error
  • Permission: needs more user authorization
  • Resource: out of tokens, disk full

Retry Policy

  • Transient: exponential backoff (1s → 2s → 4s, max 3 attempts)
  • Permanent: report immediately, try a different approach
  • Permission: hand back to user

3-Failure Circuit Breaker

CLAUDE.md: "execution: 3-failure circuit breaker → revisit architecture"

If the same tool fails 3×, rethink the strategy. Infinite retry just burns tokens for the same result.


10. Tool Output Handling

Big Results

  • 50K SQL rows → only first 100 in context, rest persisted to file
  • Model sees "Result has 100K rows in /tmp/result.csv. Schema: ..."
  • Additional tool calls fetch slices on demand

Parse Failures

  • Invalid JSON → return raw + error so the model can retry with more info
  • Retry uses a different representation (e.g., explicit schema)

Non-Deterministic Results

  • Same input, different output → caching impossible
  • Force determinism in the tool itself (random seed, fixed timestamp)

Bottom Line

Area Recommendation
Permissions Default deny + whitelist
Isolation Match risk (process → worktree → container → VM)
MCP Trust verification + token limits + explicit allow
Hooks PreToolUse / PostToolUse both
Failures Classify + exponential backoff + 3-failure circuit breaker

The single takeaway: "Models want to call tools. The harness's job is to define which tools and how to call them."

Part 5 (next) goes one level above tool calls — deciding which model handles which task: Multi-Provider Routing.


First-Party Sources

  • "Dive into Claude Code": arxiv.org/abs/2604.14228
  • Anthropic MCP spec: modelcontextprotocol.io
  • Cursor 3 cloud agents: cursor.com/cloud (April 2026)
  • Martin Fowler harness article: martinfowler.com/articles/harness-engineering.html

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System