Cost Optimization
Strategies to reduce LLM costs while maintaining quality.
Use Auto-Routing
Set model: "auto" to let Zihin select the most cost-efficient model for each request. Simple queries route to cheaper models; complex ones use premium models.
{
"query": "What is 2 + 2?",
"model": "auto"
}
Auto-routing typically reduces costs by 20-40% compared to using a premium model for all requests.
Leverage Response Cache
Zihin automatically caches identical requests. The cache provides up to 40% cost reduction for repeated queries. Cache behavior:
- Identical query + model = cache hit
- Cache TTL is managed server-side
- No configuration needed
Choose the Right Model Tier
| Tier | Cost | Use When |
|---|---|---|
economical | $ | Simple Q&A, classification, formatting |
standard | $$ | General conversations, content generation |
flagship | $$$ | Complex reasoning, code generation, analysis |
Monitor Usage
Track costs in real-time through:
- Console dashboard — Visual cost breakdowns by model, agent, and time
- Telemetry API — Programmatic access to usage metrics
Token Packs
Purchase token packs for volume discounts. Manage packs at console.zihin.ai > Billing > Token Packs.