Skip to main content

LLM Calls

Make LLM calls through the unified Zihin API.

Public Endpoint (API Key)

POST https://llm.zihin.ai/api/v3/llm/public/call
curl -X POST https://llm.zihin.ai/api/v3/llm/public/call \
-H "X-Api-Key: zhn_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"query": "Explain quantum computing in simple terms",
"model": "auto"
}'

Authenticated Endpoint (JWT)

POST https://llm.zihin.ai/api/v3/llm/call
curl -X POST https://llm.zihin.ai/api/v3/llm/call \
-H "Authorization: Bearer <jwt-token>" \
-H "x-tenant-id: <uuid>" \
-H "x-agent-id: <uuid>" \
-H "Content-Type: application/json" \
-d '{
"query": "Explain quantum computing",
"model": "auto"
}'

Request Parameters

ParameterTypeRequiredDefaultDescription
querystringYesThe prompt
modelstringNo"auto"Model ID or "auto"
providerstringNoauto-detectedProvider name
messagesarrayNoConversation history
temperaturenumberNo0.7Creativity (0-2)
max_tokensnumberNo4096Max response tokens
streambooleanNofalseEnable streaming

Conversation History

Send previous messages for multi-turn conversations:

{
"query": "What about its applications?",
"model": "auto",
"messages": [
{"role": "user", "content": "What is quantum computing?"},
{"role": "assistant", "content": "Quantum computing uses quantum bits..."}
]
}

Response

{
"success": true,
"response": "Quantum computing uses quantum bits (qubits)...",
"model": "gpt-4.1-mini",
"provider": "openai",
"usage": {
"input_tokens": 15,
"output_tokens": 120,
"total_tokens": 135
},
"cost": 0.00045,
"latency_ms": 850
}