All Agents Went Silent Simultaneously — Claude Code OAuth Token Expiry: Outage Analysis and Recovery

4월 04, 2026

Root cause analysis of an extended-period outage and five prevention measures

Key Summary

Multiple agents connected to Telegram and Discord stopped responding simultaneously for an extended period — cause: Claude Code OAuth token expiry
This is not an isolated case — GitHub Issue #12447 has 24+ comments and 22+ upvotes; no official Anthropic response as of the time of filing
Mitigation: long-lived token via claude setup-token (1-year validity) + watchdog auth checks + remote re-authentication capability

Background

Operating multiple agents around the clock exposes infrastructure-level failures that are quieter and more destructive than code bugs.

In this incident, every agent connected to Telegram and Discord stopped responding at the same time. tmux sessions were alive and messages were being received, but all Claude API calls were failing silently.

The anomaly was only detected after an extended period had elapsed.

The root cause was straightforward: OAuth token expiry.

Analysis

1. Failure Timeline

Phase	State
Outage onset	All agents stop responding
Outage duration	Extended period of undetected silence
Recovery	`claude /login` re-authentication → full restart → restored

The symptoms appeared severe, but the cause was a single expired auth token.

2. Why This Happens

The default authentication method for Claude Code under a Max subscription is OAuth. The token TTL is 2–4 hours, and automatic refresh can silently fail.

When refresh fails, every active session loses authentication simultaneously.

Three structural weaknesses contribute to this:

No expiry warning is issued before the token expires
There is no way to query the token's remaining TTL
All in-flight tasks are left in an incomplete state when expiry occurs

3. Community Reports — GitHub Issue Status

This failure mode has been widely reported on GitHub.

Issue	Description	Status
#12447	OAuth token expiry interrupts autonomous workflows	24+ comments, 22+ upvotes
#33811	Both login and logout return 401 after token expiry	Open
#36911	Token expires multiple times per day	Open
#19456	macOS Keychain permission error prevents refresh	Open

No official Anthropic response exists as of the time these issues were filed.

4. Resolution: `claude setup-token` — Long-Lived Token

Claude Code provides a setup-token command that generates an OAuth token valid for one year.

claude setup-token

The generated token follows the format sk-ant-oat01-xxxxx and is applied via environment variable:

export CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-xxxxx

Prerequisites:

A Claude Pro or Max subscription is required
Token generation is CLI-only (not available in the Desktop app)
The token must be stored securely

5. Claude Code Authentication Priority (Official Docs)

Claude Code resolves authentication in the following order. Select the method appropriate for your environment.

Priority	Method	Description
1	Cloud Provider	Bedrock / Vertex / Foundry environment variables
2	ANTHROPIC_AUTH_TOKEN	Bearer token (for LLM gateways)
3	ANTHROPIC_API_KEY	Console API key
4	apiKeyHelper	Dynamic / rotating credential script
5	OAuth (/login)	Default for Pro / Max / Team / Enterprise subscribers

Priority 5 (OAuth) is the default. Any subscriber without explicit configuration lands here automatically — and this is the source of the expiry failure.

6. Prevention — Five Measures

After recovery, the following five controls were implemented to prevent recurrence.

Run claude setup-token → set a long-lived auth token (1-year validity)
Add an auth check to the watchdog → immediate alert on expiry detection
/auth command → remotely verify authentication status from Telegram or Discord
/login command → issue a remote re-authentication URL (mobile tap-to-authenticate)
Claude Auth status on the dashboard → single-pane visibility into auth state across all agents

Architectural Implications

Authentication is a single point of failure (SPOF). When multiple agents share the same OAuth token, a single expiry takes down the entire fleet.
The industry-standard pattern is dual-key rotation. A Blue/Green approach — where the old and new keys are simultaneously valid for a short overlap — is the safe model. Claude Code does not support this natively, making the long-lived token the practical workaround.
Without a watchdog, outages go undetected. If there is no alert when an agent goes silent, an extended period of non-responsiveness can pass unnoticed. A watchdog with an auth check closes this gap.

Conclusion

Production automation systems fail. The measure of a resilient system is not whether failures occur, but whether recovery is fast and recurrence is structurally prevented. This incident demonstrates how an invisible dependency — authentication — can become the single point of failure for an entire fleet.

For anyone running Claude Code agents around the clock: claude setup-token is not optional.

Sources: - GitHub Issue #12447 — OAuth token expiration disrupts autonomous workflows - GitHub Issue #33811 — OAuth token expired, no recovery path - Claude Code Authentication Docs

이 블로그 검색

MaJu Tech Notes