Skip to main content

Agent Spending Cap

A spending cap is a per-agent ceiling, in US dollars, on how much an agent may cost within a calendar month. Once the agent has spent its budget for the month, it stops responding until the next month — or until you raise the limit.

The cap is set per agent via the budget_limit_usd field:

ValueMeaning
null (default)Unlimited — the agent has no spending cap
a number (e.g. 25.00)The agent may spend up to this many USD per month
Only capped agents are affected

Agents without a limit (budget_limit_usd = null) behave exactly as before. Setting a cap on one agent never changes any other agent.


Budget vs. token quota

The spending cap is distinct from your plan's token quota. They protect different things:

Token quotaSpending cap (budget)
UnitPooled tokens from your planUS dollars of cost
ScopeWhole tenant (the plan's allowance)A single agent
BYOK callsSkipped — calls made with your own provider key don't draw down the quotaCounted — every call is included, BYOK too
PurposeFair use of the bundled token poolProtect the actual bill for an agent

The key point: the spending cap counts every call the agent makes, including BYOK calls made with your own provider keys. That is deliberate — the cap exists to protect what you actually pay, regardless of where the tokens are billed.


How enforcement works

Enforcement is a per-turn hard stop. A "turn" is one agent response to a user message.

  1. Before an agent starts a turn, the platform checks how much it has already spent this month.
  2. If the remaining budget is ≤ 0, the turn does not start. The user receives a graceful message — not a mid-conversation error.
  3. If there is budget left, the turn runs normally.

A few important properties:

  • Turns are never cut off mid-response. The check happens before the turn begins. Any overshoot within a single turn (a turn that starts under budget but runs long) is bounded by the agent's recursion limit.
  • Fail-open. If the balance check itself fails (an infrastructure error), the turn is allowed to proceed — a spending cap never blocks an agent because of a platform hiccup.
  • Kill switch. Enforcement can be disabled platform-wide with the BUDGET_ENFORCEMENT_ENABLED environment variable.

Recovery

To bring a stopped agent back, raise its limit (or set it to null for unlimited). The change takes effect immediately — the agent resumes on its next turn.


Endpoints

Both endpoints support hybrid authentication (JWT or API Key) and validate that the agent belongs to your tenant.

GET /api/agents/:id/budget

Return the current month's balance and the effective limit for an agent.

Required permission: telemetry:read.

Not a member view

Spending is an operator concern. A plain member role receives 403 on this endpoint — only roles with telemetry:read (e.g. admin, owner) can read an agent's budget.

Response:

{
"success": true,
"data": {
"limit_usd": 25.0,
"consumed_usd": 18.42,
"remaining_usd": 6.58,
"unlimited": false,
"period_start": "2026-06-01T00:00:00.000Z",
"overshoot_note": "The cap may be exceeded by at most the cost of one in-flight call; the hard-stop halts before the next step."
}
}

For an agent without a cap, the fields reflect the unlimited state:

{
"success": true,
"data": {
"limit_usd": null,
"consumed_usd": 12.30,
"remaining_usd": null,
"unlimited": true,
"period_start": "2026-06-01T00:00:00.000Z",
"overshoot_note": null
}
}

period_start is the ISO start of the current billing month; overshoot_note is a human-readable reminder that the cap can be exceeded by at most one in-flight call (null when the agent is unlimited).

PUT /api/agents/:id/budget

Set or update the spending cap for an agent.

Required permission: agents:update.

Request Body:

{
"limit_usd": 25.0
}

Send "limit_usd": null to remove the cap (make the agent unlimited):

{
"limit_usd": null
}
Recovery loop

PUT .../budget is also the recovery path — raising the limit on a stopped agent lets it resume immediately.


How it works

Under the hood:

  • Every call's cost is appended to an immutable ledger right after the call completes, alongside telemetry.
  • The balance is a per-month view over that ledger, so the current spend is always the sum of the month's entries.
  • Accumulation happens on every call (BYOK included), which is why the cap reflects true cost.

Observability

Spending-cap activity is exported as metrics under the zihin-agent-budget meter:

MetricDescription
agent.budget.consumed_usdCost accumulated, labeled by agent_id and byok
agent.budget.exceededFires when an agent is stopped for exceeding its budget
agent.budget.consume_failedAccumulation error (see below)

A Work Room event agent_budget_exceeded is emitted when an agent is stopped, so operator dashboards can react in real time.

Alerting

Watch agent.budget.consume_failed — sustained non-zero values mean cost is being under-counted silently, which can let an agent overspend its cap.