Agent Spending Cap
A spending cap is a per-agent ceiling, in US dollars, on how much an agent may cost within a calendar month. Once the agent has spent its budget for the month, it stops responding until the next month — or until you raise the limit.
The cap is set per agent via the budget_limit_usd field:
| Value | Meaning |
|---|---|
null (default) | Unlimited — the agent has no spending cap |
a number (e.g. 25.00) | The agent may spend up to this many USD per month |
Agents without a limit (budget_limit_usd = null) behave exactly as before. Setting a cap on one agent never changes any other agent.
Budget vs. token quota
The spending cap is distinct from your plan's token quota. They protect different things:
| Token quota | Spending cap (budget) | |
|---|---|---|
| Unit | Pooled tokens from your plan | US dollars of cost |
| Scope | Whole tenant (the plan's allowance) | A single agent |
| BYOK calls | Skipped — calls made with your own provider key don't draw down the quota | Counted — every call is included, BYOK too |
| Purpose | Fair use of the bundled token pool | Protect the actual bill for an agent |
The key point: the spending cap counts every call the agent makes, including BYOK calls made with your own provider keys. That is deliberate — the cap exists to protect what you actually pay, regardless of where the tokens are billed.
How enforcement works
Enforcement is a per-turn hard stop. A "turn" is one agent response to a user message.
- Before an agent starts a turn, the platform checks how much it has already spent this month.
- If the remaining budget is ≤ 0, the turn does not start. The user receives a graceful message — not a mid-conversation error.
- If there is budget left, the turn runs normally.
A few important properties:
- Turns are never cut off mid-response. The check happens before the turn begins. Any overshoot within a single turn (a turn that starts under budget but runs long) is bounded by the agent's recursion limit.
- Fail-open. If the balance check itself fails (an infrastructure error), the turn is allowed to proceed — a spending cap never blocks an agent because of a platform hiccup.
- Kill switch. Enforcement can be disabled platform-wide with the
BUDGET_ENFORCEMENT_ENABLEDenvironment variable.
Recovery
To bring a stopped agent back, raise its limit (or set it to null for unlimited). The change takes effect immediately — the agent resumes on its next turn.
Endpoints
Both endpoints support hybrid authentication (JWT or API Key) and validate that the agent belongs to your tenant.
GET /api/agents/:id/budget
Return the current month's balance and the effective limit for an agent.
Required permission: telemetry:read.
Spending is an operator concern. A plain member role receives 403 on this endpoint — only roles with telemetry:read (e.g. admin, owner) can read an agent's budget.
Response:
{
"success": true,
"data": {
"limit_usd": 25.0,
"consumed_usd": 18.42,
"remaining_usd": 6.58,
"unlimited": false,
"period_start": "2026-06-01T00:00:00.000Z",
"overshoot_note": "The cap may be exceeded by at most the cost of one in-flight call; the hard-stop halts before the next step."
}
}
For an agent without a cap, the fields reflect the unlimited state:
{
"success": true,
"data": {
"limit_usd": null,
"consumed_usd": 12.30,
"remaining_usd": null,
"unlimited": true,
"period_start": "2026-06-01T00:00:00.000Z",
"overshoot_note": null
}
}
period_start is the ISO start of the current billing month; overshoot_note is a human-readable reminder that the cap can be exceeded by at most one in-flight call (null when the agent is unlimited).
PUT /api/agents/:id/budget
Set or update the spending cap for an agent.
Required permission: agents:update.
Request Body:
{
"limit_usd": 25.0
}
Send "limit_usd": null to remove the cap (make the agent unlimited):
{
"limit_usd": null
}
PUT .../budget is also the recovery path — raising the limit on a stopped agent lets it resume immediately.
How it works
Under the hood:
- Every call's cost is appended to an immutable ledger right after the call completes, alongside telemetry.
- The balance is a per-month view over that ledger, so the current spend is always the sum of the month's entries.
- Accumulation happens on every call (BYOK included), which is why the cap reflects true cost.
Observability
Spending-cap activity is exported as metrics under the zihin-agent-budget meter:
| Metric | Description |
|---|---|
agent.budget.consumed_usd | Cost accumulated, labeled by agent_id and byok |
agent.budget.exceeded | Fires when an agent is stopped for exceeding its budget |
agent.budget.consume_failed | Accumulation error (see below) |
A Work Room event agent_budget_exceeded is emitted when an agent is stopped, so operator dashboards can react in real time.
Watch agent.budget.consume_failed — sustained non-zero values mean cost is being under-counted silently, which can let an agent overspend its cap.