Streaming
Stream LLM responses in real-time using Server-Sent Events (SSE).
Usage
Set stream: true in your request:
curl -X POST https://llm.zihin.ai/api/v3/llm/public/call \
-H "X-Api-Key: zhn_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"query": "Write a short story",
"model": "auto",
"stream": true
}'
Response Format
The response uses SSE format. Each event contains a chunk of the response:
data: {"chunk": "Once", "done": false}
data: {"chunk": " upon", "done": false}
data: {"chunk": " a time", "done": false}
...
data: {"chunk": "", "done": true, "usage": {"input_tokens": 10, "output_tokens": 150, "total_tokens": 160}}
The final event includes done: true and usage statistics.
Live Sessions
For interactive agent conversations with persistent state:
POST https://llm.zihin.ai/api/sessions
Sessions maintain conversation context server-side, eliminating the need to send message history with each request.