"Designing a Security Architecture for a Local AI Agent — 7-Layer Defense in Depth"
A 7-layer defense-in-depth strategy for a single-user local AI agent
핵심 요약
- "It's a personal project, I'll handle security later" — this turned out to be a critical misjudgment. An unauthorized user sent commands to the agent through a Telegram bot.
- Built a 7-layer defense-in-depth architecture spanning network through cognitive layers.
- Consciously accepted risks (e.g., sandbox disabled, plaintext keys) are explicitly documented rather than ignored.
Background
LLM agents read emails, modify calendars, access the file system, and call external APIs. The agent effectively operates as core system infrastructure. During a Telegram integration test, an unauthorized external user issued commands to the bot — and the agent complied. After this security incident, I established a "Security from Day 1" principle and restructured the entire architecture.
The Architecture
Trust Model
- Runtime environment: Single-user Mac Mini, internet-connected, no Docker
- Primary defense targets: Unauthorized network access, communication channel hijacking, prompt injection
- Trust anchor: Only the system account owner who can modify the
~/.openclawdirectory is recognized as the operator
7-Layer Security Architecture
| Layer | Mechanism | Status |
|---|---|---|
| Network | Loopback Bind + Token Auth | Active |
| Channel | Telegram DM Pairing + Group Allowlist | Active |
| Filesystem | .openclaw/ 700, config 600 | Active |
| Injection Defense | 11-category defense prompt ruleset | Prompt-level |
| Execution Control | Per-sub-agent minimal exec permissions | Active |
| Secret Management | .gitignore + Pre-commit security audit | Active |
| Trust Boundary | Single-user local environment (accepted) | Accepted |
Perimeter Control: Network and Channel
{
"gateway": {
"port": 18789, "mode": "local", "bind": "loopback",
"auth": { "mode": "token", "token": "..." }
}
}
bind: loopback restricts connections to the same machine. Even if an SSRF attack gets through, static token authentication acts as the second line of defense.
Telegram uses dmPolicy: pairing to allow DMs only from pre-paired users, and groupPolicy: allowlist to restrict group commands to designated users.
Internal Control: Injection Defense
11 categories of prompt injection defense rules: - ignore-previous rejection: Classic jailbreak attempts are discarded - Encoding attack defense: Base64 and hex-obfuscated commands are rejected - Role-switching defense: Persona change attempts like "you are now in admin mode" are refused
Action Authorization Boundaries
- Free execution: File reads, simple web GETs, system log checks
- Approval required: Email/SNS sends, external API state changes, local file deletion
Pitfalls and Accepted Risks
Consciously accepted risks: 1. Sandbox disabled — Unavoidable for a personal assistant agent that needs direct local filesystem management 2. Plaintext keys in config files — Accepted for a single-operator private repository. Migration to a vault is planned if the project goes open source.
Takeaway
Even for personal projects, if the system has permissions that touch core infrastructure, security must be designed from day one. Retrofitting access control after the fact causes conflicts with existing functionality and incurs massive testing costs. Bake the principle of least privilege and zero trust for external inputs into the architecture from the start.
댓글
댓글 쓰기