Skip to main content

Cost Optimization

Strategies to reduce LLM costs while maintaining quality.

Use Auto-Routing

Set model: "auto" to let Zihin select the most cost-efficient model for each request. Simple queries route to cheaper models; complex ones use premium models.

{
"query": "What is 2 + 2?",
"model": "auto"
}

Auto-routing typically reduces costs by 20-40% compared to using a premium model for all requests.

Leverage Response Cache

Zihin automatically caches identical requests. The cache provides up to 40% cost reduction for repeated queries. Cache behavior:

  • Identical query + model = cache hit
  • Cache TTL is managed server-side
  • No configuration needed

Choose the Right Model Tier

TierCostUse When
economical$Simple Q&A, classification, formatting
standard$$General conversations, content generation
flagship$$$Complex reasoning, code generation, analysis

Monitor Usage

Track costs in real-time through:

  • Console dashboard — Visual cost breakdowns by model, agent, and time
  • Telemetry API — Programmatic access to usage metrics

Token Packs

Purchase token packs for volume discounts. Manage packs at console.zihin.ai > Billing > Token Packs.