Skip to main content

Streaming

Stream LLM responses in real-time using Server-Sent Events (SSE).

Usage

Set stream: true in your request:

curl -X POST https://llm.zihin.ai/api/v3/llm/public/call \
-H "X-Api-Key: zhn_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"query": "Write a short story",
"model": "auto",
"stream": true
}'

Response Format

The response uses SSE format. Each event contains a chunk of the response:

data: {"chunk": "Once", "done": false}
data: {"chunk": " upon", "done": false}
data: {"chunk": " a time", "done": false}
...
data: {"chunk": "", "done": true, "usage": {"input_tokens": 10, "output_tokens": 150, "total_tokens": 160}}

The final event includes done: true and usage statistics.

Live Sessions

For interactive agent conversations with persistent state:

POST https://llm.zihin.ai/api/sessions

Sessions maintain conversation context server-side, eliminating the need to send message history with each request.