Streaming

Stream LLM responses in real-time using Server-Sent Events (SSE).

Usage

Set stream: true in your request:

curl -X POST https://llm.zihin.ai/api/v3/llm/public/call \
  -H "X-Api-Key: zhn_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Write a short story",
    "model": "auto",
    "stream": true
  }'

Response Format

The response uses SSE format. Each event contains a chunk of the response:

data: {"chunk": "Once", "done": false}
data: {"chunk": " upon", "done": false}
data: {"chunk": " a time", "done": false}
...
data: {"chunk": "", "done": true, "usage": {"input_tokens": 10, "output_tokens": 150, "total_tokens": 160}}

The final event includes done: true and usage statistics.

Live Sessions

For interactive agent conversations with persistent state:

POST https://llm.zihin.ai/api/sessions

Sessions maintain conversation context server-side, eliminating the need to send message history with each request.

Usage​

Response Format​

Live Sessions​

Usage

Response Format

Live Sessions