"Tools & Sandboxing — Where the Agent Acts (Harness Series 4/6)"

4월 29, 2026

The model reasons. The harness acts. A bad tool call can damage the user's system. This article covers the safety architecture around the action surface.

Series Roadmap (6 parts)

What Is Harness Engineering?
Context Engineering
Memory Systems
Tools & Sandboxing ← this article
Multi-Provider Routing
Evaluation & Ops

1. Tools = The Model's Hands

LLMs only reason. The hands — reading files, running commands, calling APIs — are separate components. Tools.

From the Claude Code analysis paper (arXiv 2604.14228): "When your agent reads a file, the harness decides whether the read is allowed, what happens to the result, and how much of the response fits in the next prompt. The model never touches the file system directly."

That's the foundational principle. The harness sits always between model and system.

2. Tool Specs — How a Model Calls Tools

Function Calling (OpenAI)

{
  "name": "read_file",
  "description": "Read a file from disk",
  "parameters": {
    "type": "object",
    "properties": {
      "path": {"type": "string"}
    }
  }
}

Tool Use (Anthropic)

Same structure, but responses include separate tool_use blocks. Used by Claude Code and Cursor.

MCP (Model Context Protocol)

Anthropic's protocol, released November 2024
A server exposes tools; any client can use them
Adopted by Claude Code, Cursor, and Codex CLI

Skills (CLAUDE.md pattern)

Encapsulates reusable procedures, not raw tools
Triggered by keywords
Used by Claude Code and OpenClaw

3. Permission Systems — What to Block

Claude Code: 7 permission modes + ML-based classifier (per arXiv 2604.14228)

Claude Code's 7-Mode System (2026)

default — confirm every tool call
acceptEdits — auto file edits, confirm rest
bypassPermissions — auto everything (dangerous)
plan — no tool use, planning only
doNotAllow — explicit denylist
allow — explicit allowlist
dynamic — ML classifier decides safety

Permission Design Principles

Default deny, allow via whitelist
Directory-level isolation: /Users/foo/project/** allowed, ~/.ssh blocked
Bash commands matched by prefix: git status allowed, git push --force blocked

CLAUDE.md Permission Pattern

{
  "permissions": {
    "allow": ["Read(./**)", "Bash(git status)", "Bash(npm test:*)"],
    "deny": ["Read(~/.ssh/*)", "Bash(rm -rf:*)", "Bash(curl http*)"]
  }
}

4. Sandboxing — Levels of Isolation

4-1. Process Isolation

Tool runs in a separate process
The main agent process won't die
Lightweight, weak protection

4-2. Container (Docker)

Filesystem separation
Network policy possible
Memory/CPU limits
Per-tool spin-up is also viable

4-3. VM Isolation

Cursor 3 (April 2026): "Cursor Cloud Agents run autonomous coding tasks in isolated Linux VMs with full dev environments"

Full OS isolation
Strongest protection, heaviest
Becoming standard for cloud agents

4-4. Worktree (Git)

Same repo, separate branch directory
Main worktree untouched
Supported by Claude Code and Cursor 3

CLAUDE.md recommends: "execution: ... Use git worktree for risky changes."

5. Per-Tool Security Patterns

File Reads

Path validation (block ../)
Block symlink traversal
Size limits (large files chunked)

Bash Execution

Command allowlist
Pattern denylist (rm -rf /, fork bombs, sudo, etc.)
Timeout (default 2 minutes)
stdout/stderr size cap

Web Fetch / URL

Block localhost (SSRF prevention)
Block private IP ranges (10.x, 172.16.x, 192.168.x)
Force HTTPS
Redirect limit (loop prevention)

Database / API

Read-only credentials (when possible)
Rate limiting
Query timeout

6. MCP Security (April 2026)

MCP Servers Are External Code

Installing a third-party MCP server runs the server's code on your machine
Trust verification required
Prefer official servers (anthropic, openai, vercel, etc.)

MCP Token Limits (Claude Code)

Tool results capped at 25K tokens, warning at 10K
Server-opt-in up to 500K characters, persisted to disk
Prevents context blow-up

MCP Permission Delegation

When adding an MCP server, explicitly verify which tools it exposes. Don't auto-allow.

7. Hooks — Pre/Post Tool Guardrails

CLAUDE.md pattern: - PreToolUse(Bash|Write|Edit): input-stage guardrail. Block dangerous patterns + denied paths up front - PostToolUse(Edit|Write): detect debug statements + credential leaks

Claude Code hook system: 4 lifecycle events. Each runs user-defined shell commands.

Real Hook Examples

if [[ "$1" =~ "rm -rf" ]]; then
  echo "BLOCKED: rm -rf detected"
  exit 1
fi

if grep -q "API_KEY=" "$1"; then
  echo "WARNING: hardcoded credential"
fi

8. Isolation Level by Use Case

Use case	Isolation	Reason
Local personal project (single user)	Process + permissions	Light, sufficient
Team coding assistant (multi-user)	Worktree + container	Mistake isolation, fast
Cloud agent (untrusted code)	VM	Full isolation required
MCP server hosting	Container per server	Inter-server isolation
AutoML / Code Interpreter	VM + network block	Egress prevention

9. Failure Handling — Tool Loop Reliability

Tools fail. Network, permissions, missing file, timeout.

Failure Classification (CLAUDE.md `docs/patterns/error-handling.md`)

Transient (retry possible): transient network
Permanent (retry pointless): missing file, syntax error
Permission: needs more user authorization
Resource: out of tokens, disk full

Retry Policy

Transient: exponential backoff (1s → 2s → 4s, max 3 attempts)
Permanent: report immediately, try a different approach
Permission: hand back to user

3-Failure Circuit Breaker

CLAUDE.md: "execution: 3-failure circuit breaker → revisit architecture"

If the same tool fails 3×, rethink the strategy. Infinite retry just burns tokens for the same result.

10. Tool Output Handling

Big Results

50K SQL rows → only first 100 in context, rest persisted to file
Model sees "Result has 100K rows in /tmp/result.csv. Schema: ..."
Additional tool calls fetch slices on demand

Parse Failures

Invalid JSON → return raw + error so the model can retry with more info
Retry uses a different representation (e.g., explicit schema)

Non-Deterministic Results

Same input, different output → caching impossible
Force determinism in the tool itself (random seed, fixed timestamp)

Bottom Line

Area	Recommendation
Permissions	Default deny + whitelist
Isolation	Match risk (process → worktree → container → VM)
MCP	Trust verification + token limits + explicit allow
Hooks	PreToolUse / PostToolUse both
Failures	Classify + exponential backoff + 3-failure circuit breaker

The single takeaway: "Models want to call tools. The harness's job is to define which tools and how to call them."

Part 5 (next) goes one level above tool calls — deciding which model handles which task: Multi-Provider Routing.

First-Party Sources

"Dive into Claude Code": arxiv.org/abs/2604.14228
Anthropic MCP spec: modelcontextprotocol.io
Cursor 3 cloud agents: cursor.com/cloud (April 2026)
Martin Fowler harness article: martinfowler.com/articles/harness-engineering.html