Skip to main content

FAQ — Frequently Asked Questions

Quick answers to the most common questions about the Zihin platform — covering the LLM API, AI agents, authentication, BYOK, MCP, webhooks, streaming, and more. Each answer links to the full documentation for deeper detail.

What is Zihin?

Zihin is a centralized, multi-tenant platform for building AI agents and calling large language models through a single unified API. It provides one interface to OpenAI, Anthropic, Google, and Grok with intelligent auto-routing, plus an agentic execution engine for autonomous AI assistants with tools, triggers, and memory. The base URL for API calls is https://llm.zihin.ai and the console is at console.zihin.ai.

Which LLM providers and models does Zihin support?

Zihin supports four LLM providers through one unified API: OpenAI, Anthropic, Google, and Grok (OpenRouter is also referenced as an additional provider). Models use the provider.model format, such as openai.gpt-4.1-nano, anthropic.claude-sonnet-4-6, google.gemini-2.5-flash, and grok.grok-3-mini. The catalog is managed dynamically, so you can switch between providers without changing your code. See Models & Providers for the full list.

How do I authenticate to the Zihin API?

Zihin supports two authentication methods: API Keys and JWT tokens. For public endpoints and simple integrations, send your API key in the X-Api-Key header (or as a Bearer token). For multi-tenant applications with user context, send a JWT in the Authorization header along with the x-tenant-id header. API keys follow the format zhn_live_ for production and zhn_test_ for sandbox. See Authentication for details.

How do I make my first LLM call?

Sign up at console.zihin.ai, create an API key under Settings > API Keys, then POST to https://llm.zihin.ai/api/v3/llm/public/call with your X-Api-Key header and a JSON body containing a query and a model (use auto for intelligent routing). The response includes the answer, the model and provider used, token usage, cost, and latency. You can do this in under five minutes using curl, Node.js, or Python — see the Quickstart.

What is auto-routing and how does it work?

Auto-routing lets Zihin automatically select the best model for your task when you set model to auto. The system analyzes your prompt and routes to the optimal model based on task complexity, cost efficiency, and response quality requirements. This optimizes cost by choosing the most efficient model for each request without you hardcoding a specific model. Learn more in the LLM API Overview.

Can I bring my own OpenAI or Anthropic API key (BYOK)?

Yes. With Bring Your Own Key (BYOK) you register your own provider API key as an encrypted secret, and your agents use it instead of the Zihin pool without consuming your token quota. BYOK is supported for OpenAI, Anthropic, Google, and Grok, and it unlocks all models from that provider regardless of your plan tier. If your key fails or expires, Zihin automatically falls back to the Zihin pool, which respects your plan tier and consumes quota. BYOK is available on all plans, including the free Starter plan. See Secrets & Provider Keys.

What are Zihin agents?

Agents are AI-powered assistants configured with personas, tools, workflows, and security policies that run autonomously using an orchestration engine with automatic tool selection and multi-step reasoning. Agent types include assistant, chatbot, workflow, and classifier. Agents follow a lifecycle of draft to published to archived, where publishing validates all schemas and creates an automatic version snapshot for rollback. See the Agents Overview.

What types of tools can agents use?

Zihin agents can use database tools that run SQL queries against PostgreSQL or Supabase connections, API tools that make HTTP calls to external services, and MCP tools from external Model Context Protocol servers. Tool credentials are stored in an encrypted vault and referenced by secret_ref or vault_secret_id, never sent in plaintext. The agent selects and chains tools automatically during multi-step reasoning — see Tools.

Does Zihin support MCP (Model Context Protocol)?

Yes. Zihin provides an MCP Server (the @zihin/mcp-server npm package) that lets you manage agents, schemas, triggers, connections, and secrets directly from AI IDEs such as Claude Desktop, Claude Code, Cursor, Windsurf, and Codex. It exposes 76 tools across consumer, builder-read, and builder-write categories, plus resources and guided prompts. It supports stdio transport via the npm package and HTTP Streamable transport for server-to-server access, and uses the same API Key system as the REST API. See the MCP Server guide.

How do webhooks trigger agents?

Webhook triggers let external systems such as n8n, Zapier, Twilio, or custom apps invoke an agent via HTTP POST. You create a webhook trigger that defines query extraction (which field holds the message), context mapping, response formatting with optional message splitting, and a session strategy that can derive a stable session from fields like userId and companyId. Execution endpoints can authenticate via HMAC, API Key, or none, and support sync or async callback execution. See Webhook Triggers.

How does streaming work in Zihin?

Zihin uses Server-Sent Events (SSE) for real-time streaming. The LLM API streams token-by-token responses, and agent execution via POST /api/v2/agents/:agent_id/stream emits typed events such as metadata, status, phase, tool, thinking, response, metrics, and error. A separate live session stream lets you monitor executions in real time, and that stream closes automatically after 5 minutes or when the session finishes. See Streaming.

Is Zihin multi-tenant?

Yes. A tenant is a fully isolated workspace with its own users and roles, API keys, agents and configurations, usage quotas and billing, and telemetry data. Every API call is scoped to a tenant via the API key or the JWT x-tenant-id header, ensuring complete isolation between workspaces. See Core Concepts and the Multi-Tenant Setup guide.

What are the rate limits and how do I handle them?

Rate limits depend on your plan: Free allows 10 requests per minute and 100 per day, Basic allows 60 per minute and 5,000 per day, Pro allows 300 per minute and 50,000 per day, and Enterprise is custom. Every response includes X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers. When you receive a 429 Too Many Requests, read the Retry-After header, wait that many seconds, and retry the request. See Rate Limits and Error Codes.