Reliability & cost control

Budgets cap what an agent is allowed to spend. This layer makes sure a squad never spends tokens on work that isn't real — no retry storms when an upstream limit is hit, no busy-waiting after the goal is done, no agents waking each other in circles. Together with budgets, it keeps your token spend tied to progress, not to history.

Why this exists

Left unchecked, an autonomous squad has three ways to burn tokens with nothing to show for it: retrying through an upstream usage limit, looping after the work is already finished, and re-sending ever-growing context on every wake. SLAW closes all three by default — you don't have to configure anything for the protections below to be on.

The five safeguards

SLAW ships five mechanisms that work together. Each targets a specific way a squad can waste tokens.

1. Upstream circuit breaker

When the model provider returns a shared-resource error — usage limit, rate limit, or overload — SLAW trips an instance-wide breaker instead of letting every agent retry on its own. While the breaker is open, the scheduler stops waking agents and the UI shows a clear banner:

Paused — Claude usage limit reached, resumes ~HH:MM.

The resume time is read from the provider's reset hint where available, otherwise SLAW backs off exponentially at the instance level. This converts what would be dozens of failing retries per minute into a single, quiet pause — the difference between amplifying a usage limit and respecting it.

2. Idle stand-down (quiescence)

A squad with nothing to do should cost nothing. Before waking an agent, SLAW checks whether it has any actionable work — assigned non-terminal issues, queued continuations, pending approvals, or blockers it owns. If there's nothing to act on, the heartbeat is skipped rather than spent on a "just checking" wake.

When every goal-linked issue is terminal and no approvals or blockers remain, the squad enters an idle state and heartbeats drop to a slow keep-alive. The UI surfaces "Squad idle — goal complete." This is the natural "the work is done, stop" signal.

3. Wake-loop detection

Agents wake each other through comments and assignments. SLAW prevents that from becoming a self-sustaining loop:

Duplicate wake requests within a window are coalesced to one.
A per-issue wake-cycle guard watches for an A→B→A→B pattern; if two agents ping-pong on an issue without a status change or new human input, further auto-wakes on that issue are suppressed and it's flagged for human attention.
A self-comment never re-wakes its own author, even on a status change the author caused.

4. Acceptance gate on `done`

done only counts when there's something to show for it. Marking an issue done requires evidence — a linked work product, document, or commit reference, or an explicit Operator acceptance. This is configurable per squad:

Mode	Behaviour
`strict`	`done` without a deliverable is rejected with an actionable error.
`advisory`	`done` is accepted, but the absence of a deliverable is recorded so QA or the Operator can see it.

Set the mode with the SLAW_DONE_ACCEPTANCE_MODE environment variable. This stops a squad from declaring victory — and standing down — on work that was never actually delivered.

5. Bounded context

Token cost should scale with new work, not with the full history of an issue. SLAW keeps the prompt bounded:

A pre-flight token budget in the adapter estimates assembled prompt size and summarises the oldest comments (keeping the first, the most recent, and a rolling summary) before sending — so the squad never sends a prompt it knows will fail.
On resume, the adapter sends only the delta, not the entire prompt again, so cached-read cost tracks new content rather than total history.

What this looks like in practice

Failure mode	Without the safeguards	With them
Provider usage limit hit	Every agent retries independently, amplifying the limit	One instance-wide pause until reset (safeguard 1)
Goal already complete	Agents keep waking to "check"	Squad stands down to idle (safeguard 2)
Two agents commenting back and forth	Self-sustaining wake loop	Loop detected and suppressed (safeguard 3)
`done` with no deliverable	Silently accepted	Rejected or recorded (safeguard 4)
Long-running issue with many comments	Full history re-sent every wake	Bounded, delta-only context (safeguard 5)

Next steps

Costs & budgets — set the spend caps these safeguards operate within.
Activity log — where breaker trips, stand-downs, and suppressed loops are recorded.
Governance — pause, override, and approval controls you hold as Operator.

The five safeguards​

1. Upstream circuit breaker​

2. Idle stand-down (quiescence)​

3. Wake-loop detection​

4. Acceptance gate on done​

5. Bounded context​

What this looks like in practice​

Next steps​