Agent Spending Cap

A spending cap is a per-agent ceiling, in US dollars, on how much an agent may cost within a calendar month. Once the agent has spent its budget for the month, it stops responding until the next month — or until you raise the limit.

The cap is set per agent via the budget_limit_usd field:

Value	Meaning
`null` (default)	Unlimited — the agent has no spending cap
a number (e.g. `25.00`)	The agent may spend up to this many USD per month

Only capped agents are affected

Agents without a limit (budget_limit_usd = null) behave exactly as before. Setting a cap on one agent never changes any other agent.

Budget vs. token quota

The spending cap is distinct from your plan's token quota. They protect different things:

	Token quota	Spending cap (budget)
Unit	Pooled tokens from your plan	US dollars of cost
Scope	Whole tenant (the plan's allowance)	A single agent
BYOK calls	Skipped — calls made with your own provider key don't draw down the quota	Counted — every call is included, BYOK too
Purpose	Fair use of the bundled token pool	Protect the actual bill for an agent

The key point: the spending cap counts every call the agent makes, including BYOK calls made with your own provider keys. That is deliberate — the cap exists to protect what you actually pay, regardless of where the tokens are billed.

How enforcement works

Enforcement is a per-turn hard stop. A "turn" is one agent response to a user message.

Before an agent starts a turn, the platform checks how much it has already spent this month.
If the remaining budget is ≤ 0, the turn does not start. The user receives a graceful message — not a mid-conversation error.
If there is budget left, the turn runs normally.

A few important properties:

Turns are never cut off mid-response. The check happens before the turn begins. Any overshoot within a single turn (a turn that starts under budget but runs long) is bounded by the agent's recursion limit.
Fail-open. If the balance check itself fails (an infrastructure error), the turn is allowed to proceed — a spending cap never blocks an agent because of a platform hiccup.
Kill switch. Enforcement can be disabled platform-wide with the BUDGET_ENFORCEMENT_ENABLED environment variable.

Recovery

To bring a stopped agent back, raise its limit (or set it to null for unlimited). The change takes effect immediately — the agent resumes on its next turn.

Endpoints

Both endpoints support hybrid authentication (JWT or API Key) and validate that the agent belongs to your tenant.

GET /api/agents/:id/budget

Return the current month's balance and the effective limit for an agent.

Required permission: telemetry:read.

Not a member view

Spending is an operator concern. A plain member role receives 403 on this endpoint — only roles with telemetry:read (e.g. admin, owner) can read an agent's budget.

Response:

{
  "success": true,
  "data": {
    "limit_usd": 25.0,
    "consumed_usd": 18.42,
    "remaining_usd": 6.58,
    "unlimited": false,
    "period_start": "2026-06-01T00:00:00.000Z",
    "overshoot_note": "The cap may be exceeded by at most the cost of one in-flight call; the hard-stop halts before the next step."
  }
}

For an agent without a cap, the fields reflect the unlimited state:

{
  "success": true,
  "data": {
    "limit_usd": null,
    "consumed_usd": 12.30,
    "remaining_usd": null,
    "unlimited": true,
    "period_start": "2026-06-01T00:00:00.000Z",
    "overshoot_note": null
  }
}

period_start is the ISO start of the current billing month; overshoot_note is a human-readable reminder that the cap can be exceeded by at most one in-flight call (null when the agent is unlimited).

PUT /api/agents/:id/budget

Set or update the spending cap for an agent.

Required permission: agents:update.

Request Body:

{
  "limit_usd": 25.0
}

Send "limit_usd": null to remove the cap (make the agent unlimited):

{
  "limit_usd": null
}

Recovery loop

PUT .../budget is also the recovery path — raising the limit on a stopped agent lets it resume immediately.

How it works

Under the hood:

Every call's cost is appended to an immutable ledger right after the call completes, alongside telemetry.
The balance is a per-month view over that ledger, so the current spend is always the sum of the month's entries.
Accumulation happens on every call (BYOK included), which is why the cap reflects true cost.

Observability

Spending-cap activity is exported as metrics under the zihin-agent-budget meter:

Metric	Description
`agent.budget.consumed_usd`	Cost accumulated, labeled by `agent_id` and `byok`
`agent.budget.exceeded`	Fires when an agent is stopped for exceeding its budget
`agent.budget.consume_failed`	Accumulation error (see below)

A Work Room event agent_budget_exceeded is emitted when an agent is stopped, so operator dashboards can react in real time.

Alerting

Watch agent.budget.consume_failed — sustained non-zero values mean cost is being under-counted silently, which can let an agent overspend its cap.

Budget vs. token quota​

How enforcement works​

Recovery​

Endpoints​

GET /api/agents/:id/budget​

PUT /api/agents/:id/budget​

How it works​

Observability​