Autonomous Agent Safety

Three mandatory prerequisites before shipping any autonomous agent — kill switch, deduplication, circuit breaker. From the runaway-loop incident that cost real dollars and required a manual webhook deletion to stop.

Origin: SuperNova runaway loop incident Domain: agents · safety

Before any autonomous agent feature ships, verify three things:

1. Kill switch

Hard stop mechanism — DB flag, admin endpoint, or similar. Checked between every action in the agent loop. Not a prompt instruction. Prompts are suggestions, not enforcement.

2. Message deduplication

Any webhook-triggered agent must deduplicate incoming messages. Webhook providers (Telegram, Stripe, etc.) retry delivery when the handler takes too long to respond. Without dedup, one message becomes N parallel invocations, each running their own retry loops.

3. Circuit breaker

After N failed attempts at the same action, stop and report. Never retry indefinitely. maxRounds is necessary but not sufficient — you need per-action failure counting, not just total rounds.

Bonus: timeout budget

Before shipping, calculate: (number of actions) × (time per action) vs configured timeout. If the math doesn't work, the feature will fail on every attempt and trigger retry cascading.

Incident reference

SuperNova Stagehand form fill. 12 fields × 10s/field = 120s. Tool timeout = 90s. Guaranteed failure → retry loop → Telegram webhook retry → cascading parallel loops for 10+ minutes. No kill switch. The only fix was deleting the webhook. Cost: ~$5 API burn, 15 minutes of incident response, manual webhook deletion.

Checklist (copy for PR review)

Can I stop this agent mid-execution? How?
Are incoming messages deduplicated?
What happens after 2 consecutive failures of the same action?
What's the timeout budget? (actions × time/action < timeout)
What's the max concurrent invocations possible?