LLM Calls
Make LLM calls through the unified Zihin API.
Public Endpoint (API Key)
POST https://llm.zihin.ai/api/v3/llm/public/call
curl -X POST https://llm.zihin.ai/api/v3/llm/public/call \
-H "X-Api-Key: zhn_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"query": "Explain quantum computing in simple terms",
"model": "auto"
}'
Authenticated Endpoint (JWT)
POST https://llm.zihin.ai/api/v3/llm/call
curl -X POST https://llm.zihin.ai/api/v3/llm/call \
-H "Authorization: Bearer <jwt-token>" \
-H "x-tenant-id: <uuid>" \
-H "x-agent-id: <uuid>" \
-H "Content-Type: application/json" \
-d '{
"query": "Explain quantum computing",
"model": "auto"
}'
Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | Yes | — | The prompt |
model | string | No | "auto" | Model ID or "auto" |
provider | string | No | auto-detected | Provider name |
messages | array | No | — | Conversation history |
temperature | number | No | 0.7 | Creativity (0-2) |
max_tokens | number | No | 4096 | Max response tokens |
stream | boolean | No | false | Enable streaming |
Conversation History
Send previous messages for multi-turn conversations:
{
"query": "What about its applications?",
"model": "auto",
"messages": [
{"role": "user", "content": "What is quantum computing?"},
{"role": "assistant", "content": "Quantum computing uses quantum bits..."}
]
}
Response
{
"success": true,
"response": "Quantum computing uses quantum bits (qubits)...",
"model": "gpt-4.1-mini",
"provider": "openai",
"usage": {
"input_tokens": 15,
"output_tokens": 120,
"total_tokens": 135
},
"cost": 0.00045,
"latency_ms": 850
}