Cost Optimization

Strategies to reduce LLM costs while maintaining quality.

Use Auto-Routing

Set model: "auto" to let Zihin select the most cost-efficient model for each request. Simple queries route to cheaper models; complex ones use premium models.

{
  "query": "What is 2 + 2?",
  "model": "auto"
}

Auto-routing typically reduces costs by 20-40% compared to using a premium model for all requests.

Leverage Response Cache

Zihin automatically caches identical requests. The cache provides up to 40% cost reduction for repeated queries. Cache behavior:

Identical query + model = cache hit
Cache TTL is managed server-side
No configuration needed

Choose the Right Model Tier

Tier	Cost	Use When
`economical`	$	Simple Q&A, classification, formatting
`standard`	$$	General conversations, content generation
`flagship`	$$$	Complex reasoning, code generation, analysis

Monitor Usage

Track costs in real-time through:

Console dashboard — Visual cost breakdowns by model, agent, and time
Telemetry API — Programmatic access to usage metrics

Token Packs

Purchase token packs for volume discounts. Manage packs at console.zihin.ai > Billing > Token Packs.

Use Auto-Routing​

Leverage Response Cache​

Choose the Right Model Tier​

Monitor Usage​

Token Packs​

Use Auto-Routing

Leverage Response Cache

Choose the Right Model Tier

Monitor Usage

Token Packs