"Designing a Security Architecture for a Local AI Agent — 7-Layer Defense in Depth"

스킬 시스템 설계 — 키워드 트리거로 에이전트 능력을 모듈화하는 법

A 7-layer defense-in-depth strategy for a single-user local AI agent


핵심 요약

  • "It's a personal project, I'll handle security later" — this turned out to be a critical misjudgment. An unauthorized user sent commands to the agent through a Telegram bot.
  • Built a 7-layer defense-in-depth architecture spanning network through cognitive layers.
  • Consciously accepted risks (e.g., sandbox disabled, plaintext keys) are explicitly documented rather than ignored.
1. SKILL.md 파일 구조 — 세 가지 필수 요소

Background

LLM agents read emails, modify calendars, access the file system, and call external APIs. The agent effectively operates as core system infrastructure. During a Telegram integration test, an unauthorized external user issued commands to the bot — and the agent complied. After this security incident, I established a "Security from Day 1" principle and restructured the entire architecture.

The Architecture

3. 스킬 간 파이프라인 — 단일 스킬에서 워크플로로

Trust Model

  • Runtime environment: Single-user Mac Mini, internet-connected, no Docker
  • Primary defense targets: Unauthorized network access, communication channel hijacking, prompt injection
  • Trust anchor: Only the system account owner who can modify the ~/.openclaw directory is recognized as the operator

7-Layer Security Architecture

Layer Mechanism Status
Network Loopback Bind + Token Auth Active
Channel Telegram DM Pairing + Group Allowlist Active
Filesystem .openclaw/ 700, config 600 Active
Injection Defense 11-category defense prompt ruleset Prompt-level
Execution Control Per-sub-agent minimal exec permissions Active
Secret Management .gitignore + Pre-commit security audit Active
Trust Boundary Single-user local environment (accepted) Accepted

Perimeter Control: Network and Channel

{
  "gateway": {
    "port": 18789, "mode": "local", "bind": "loopback",
    "auth": { "mode": "token", "token": "..." }
  }
}

bind: loopback restricts connections to the same machine. Even if an SSRF attack gets through, static token authentication acts as the second line of defense.

Telegram uses dmPolicy: pairing to allow DMs only from pre-paired users, and groupPolicy: allowlist to restrict group commands to designated users.

Internal Control: Injection Defense

11 categories of prompt injection defense rules: - ignore-previous rejection: Classic jailbreak attempts are discarded - Encoding attack defense: Base64 and hex-obfuscated commands are rejected - Role-switching defense: Persona change attempts like "you are now in admin mode" are refused

Action Authorization Boundaries

  • Free execution: File reads, simple web GETs, system log checks
  • Approval required: Email/SNS sends, external API state changes, local file deletion

Pitfalls and Accepted Risks

Consciously accepted risks: 1. Sandbox disabled — Unavoidable for a personal assistant agent that needs direct local filesystem management 2. Plaintext keys in config files — Accepted for a single-operator private repository. Migration to a vault is planned if the project goes open source.

Takeaway

Even for personal projects, if the system has permissions that touch core infrastructure, security must be designed from day one. Retrofitting access control after the fact causes conflicts with existing functionality and incurs massive testing costs. Bake the principle of least privilege and zero trust for external inputs into the architecture from the start.

댓글

이 블로그의 인기 게시물

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System