"Tools & Sandboxing — Where the Agent Acts (Harness Series 4/6)"
The model reasons. The harness acts. A bad tool call can damage the user's system. This article covers the safety architecture around the action surface.
Series Roadmap (6 parts)
- What Is Harness Engineering?
- Context Engineering
- Memory Systems
- Tools & Sandboxing ← this article
- Multi-Provider Routing
- Evaluation & Ops
1. Tools = The Model's Hands
LLMs only reason. The hands — reading files, running commands, calling APIs — are separate components. Tools.
From the Claude Code analysis paper (arXiv 2604.14228): "When your agent reads a file, the harness decides whether the read is allowed, what happens to the result, and how much of the response fits in the next prompt. The model never touches the file system directly."
That's the foundational principle. The harness sits always between model and system.
2. Tool Specs — How a Model Calls Tools
Function Calling (OpenAI)
{
"name": "read_file",
"description": "Read a file from disk",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string"}
}
}
}
Tool Use (Anthropic)
Same structure, but responses include separate tool_use blocks. Used by Claude Code and Cursor.
MCP (Model Context Protocol)
- Anthropic's protocol, released November 2024
- A server exposes tools; any client can use them
- Adopted by Claude Code, Cursor, and Codex CLI
Skills (CLAUDE.md pattern)
- Encapsulates reusable procedures, not raw tools
- Triggered by keywords
- Used by Claude Code and OpenClaw
3. Permission Systems — What to Block
Claude Code: 7 permission modes + ML-based classifier (per arXiv 2604.14228)
Claude Code's 7-Mode System (2026)
- default — confirm every tool call
- acceptEdits — auto file edits, confirm rest
- bypassPermissions — auto everything (dangerous)
- plan — no tool use, planning only
- doNotAllow — explicit denylist
- allow — explicit allowlist
- dynamic — ML classifier decides safety
Permission Design Principles
- Default deny, allow via whitelist
- Directory-level isolation:
/Users/foo/project/**allowed,~/.sshblocked - Bash commands matched by prefix:
git statusallowed,git push --forceblocked
CLAUDE.md Permission Pattern
{
"permissions": {
"allow": ["Read(./**)", "Bash(git status)", "Bash(npm test:*)"],
"deny": ["Read(~/.ssh/*)", "Bash(rm -rf:*)", "Bash(curl http*)"]
}
}
4. Sandboxing — Levels of Isolation
4-1. Process Isolation
- Tool runs in a separate process
- The main agent process won't die
- Lightweight, weak protection
4-2. Container (Docker)
- Filesystem separation
- Network policy possible
- Memory/CPU limits
- Per-tool spin-up is also viable
4-3. VM Isolation
Cursor 3 (April 2026): "Cursor Cloud Agents run autonomous coding tasks in isolated Linux VMs with full dev environments"
- Full OS isolation
- Strongest protection, heaviest
- Becoming standard for cloud agents
4-4. Worktree (Git)
- Same repo, separate branch directory
- Main worktree untouched
- Supported by Claude Code and Cursor 3
CLAUDE.md recommends: "execution: ... Use git worktree for risky changes."
5. Per-Tool Security Patterns
File Reads
- Path validation (block
../) - Block symlink traversal
- Size limits (large files chunked)
Bash Execution
- Command allowlist
- Pattern denylist (
rm -rf /, fork bombs,sudo, etc.) - Timeout (default 2 minutes)
- stdout/stderr size cap
Web Fetch / URL
- Block localhost (SSRF prevention)
- Block private IP ranges (10.x, 172.16.x, 192.168.x)
- Force HTTPS
- Redirect limit (loop prevention)
Database / API
- Read-only credentials (when possible)
- Rate limiting
- Query timeout
6. MCP Security (April 2026)
MCP Servers Are External Code
- Installing a third-party MCP server runs the server's code on your machine
- Trust verification required
- Prefer official servers (anthropic, openai, vercel, etc.)
MCP Token Limits (Claude Code)
- Tool results capped at 25K tokens, warning at 10K
- Server-opt-in up to 500K characters, persisted to disk
- Prevents context blow-up
MCP Permission Delegation
When adding an MCP server, explicitly verify which tools it exposes. Don't auto-allow.
7. Hooks — Pre/Post Tool Guardrails
CLAUDE.md pattern: - PreToolUse(Bash|Write|Edit): input-stage guardrail. Block dangerous patterns + denied paths up front - PostToolUse(Edit|Write): detect debug statements + credential leaks
Claude Code hook system: 4 lifecycle events. Each runs user-defined shell commands.
Real Hook Examples
if [[ "$1" =~ "rm -rf" ]]; then
echo "BLOCKED: rm -rf detected"
exit 1
fi
if grep -q "API_KEY=" "$1"; then
echo "WARNING: hardcoded credential"
fi
8. Isolation Level by Use Case
| Use case | Isolation | Reason |
|---|---|---|
| Local personal project (single user) | Process + permissions | Light, sufficient |
| Team coding assistant (multi-user) | Worktree + container | Mistake isolation, fast |
| Cloud agent (untrusted code) | VM | Full isolation required |
| MCP server hosting | Container per server | Inter-server isolation |
| AutoML / Code Interpreter | VM + network block | Egress prevention |
9. Failure Handling — Tool Loop Reliability
Tools fail. Network, permissions, missing file, timeout.
Failure Classification (CLAUDE.md docs/patterns/error-handling.md)
- Transient (retry possible): transient network
- Permanent (retry pointless): missing file, syntax error
- Permission: needs more user authorization
- Resource: out of tokens, disk full
Retry Policy
- Transient: exponential backoff (1s → 2s → 4s, max 3 attempts)
- Permanent: report immediately, try a different approach
- Permission: hand back to user
3-Failure Circuit Breaker
CLAUDE.md: "execution: 3-failure circuit breaker → revisit architecture"
If the same tool fails 3×, rethink the strategy. Infinite retry just burns tokens for the same result.
10. Tool Output Handling
Big Results
- 50K SQL rows → only first 100 in context, rest persisted to file
- Model sees "Result has 100K rows in /tmp/result.csv. Schema: ..."
- Additional tool calls fetch slices on demand
Parse Failures
- Invalid JSON → return raw + error so the model can retry with more info
- Retry uses a different representation (e.g., explicit schema)
Non-Deterministic Results
- Same input, different output → caching impossible
- Force determinism in the tool itself (random seed, fixed timestamp)
Bottom Line
| Area | Recommendation |
|---|---|
| Permissions | Default deny + whitelist |
| Isolation | Match risk (process → worktree → container → VM) |
| MCP | Trust verification + token limits + explicit allow |
| Hooks | PreToolUse / PostToolUse both |
| Failures | Classify + exponential backoff + 3-failure circuit breaker |
The single takeaway: "Models want to call tools. The harness's job is to define which tools and how to call them."
Part 5 (next) goes one level above tool calls — deciding which model handles which task: Multi-Provider Routing.
First-Party Sources
- "Dive into Claude Code": arxiv.org/abs/2604.14228
- Anthropic MCP spec: modelcontextprotocol.io
- Cursor 3 cloud agents: cursor.com/cloud (April 2026)
- Martin Fowler harness article: martinfowler.com/articles/harness-engineering.html
๋๊ธ
๋๊ธ ์ฐ๊ธฐ