LLM Calls
Endpoints for making LLM calls with multi-provider support, auto-routing, and structured output.
Authentication
Two authentication methods are available:
| Method | Header | Use Case |
|---|---|---|
| JWT | Authorization: Bearer <jwt> + x-tenant-id | Frontend / Dashboard |
| API Key | X-Api-Key: YOUR_API_KEY | External integrations (n8n, webhooks, MCP) |
POST /api/v3/llm/call
Execute an LLM call with full multi-tenant context.
Authentication: JWT + Multi-tenant headers required
Headers:
Authorization: Bearer <jwt-token>
x-tenant-id: <uuid>
x-agent-id: <uuid>
Content-Type: application/json
Request Body:
{
"model": "auto",
"prompt": "What is the capital of France?",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello"}
],
"options": {
"temperature": 0.7,
"maxTokens": 4096,
"stop": ["END"]
},
"context": {
"task": "chat_general"
},
"response_format": { "type": "json_object" }
}
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (e.g. openai.gpt-4o) or auto for auto-routing |
prompt | string | Yes* | The prompt/question (*or messages) |
messages | array | Yes* | Conversation history (*or prompt) |
options.temperature | number | No | Response randomness (0-1). Default: 0.7 |
options.maxTokens | number | No | Maximum response length. Default: 1000 |
options.stop | string[] | No | Stop sequences — the model stops generating when it encounters any of these strings (see Stop Sequences below) |
context.task | string | No | Task type for auto-routing optimization (see Auto-Routing) |
response_format | object | No | Structured output format (see Structured Output below) |
- Use
options.maxTokens(camelCase), notmax_tokensat root level - Use
context.task, notcontext.task_type
Response:
{
"success": true,
"response": "The capital of France is Paris.",
"model": "gpt-4.1-nano",
"provider": "openai",
"usage": {
"input_tokens": 15,
"output_tokens": 8,
"total_tokens": 23
},
"cost": 0.00015,
"latency_ms": 450,
"routing": {
"is_auto_routed": true,
"model_chosen": "openai.gpt-4.1-nano",
"confidence": 0.85
}
}
POST /api/v3/llm/public/call
Execute an LLM call using API Key authentication.
Authentication: API Key required
Headers:
X-Api-Key: YOUR_API_KEY
Content-Type: application/json
Request/Response: Same as POST /api/v3/llm/call
Quick Examples
- curl
- Node.js
- Python
curl -X POST https://llm.zihin.ai/api/v3/llm/public/call \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "Summarize this contract", "model": "auto"}'
const response = await fetch('https://llm.zihin.ai/api/v3/llm/public/call', {
method: 'POST',
headers: {
'X-Api-Key': 'YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
query: 'Summarize this contract',
model: 'auto',
options: { temperature: 0.7, maxTokens: 4096 },
}),
});
const data = await response.json();
console.log(data.response);
import requests
response = requests.post(
"https://llm.zihin.ai/api/v3/llm/public/call",
headers={
"X-Api-Key": "YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"query": "Summarize this contract",
"model": "auto",
"options": {"temperature": 0.7, "maxTokens": 4096},
},
)
data = response.json()
print(data["response"])
Structured Output (response_format)
Force the LLM to return a specific JSON structure.
Simple JSON Object:
{
"model": "openai.gpt-4o",
"prompt": "List 3 countries and their capitals",
"options": { "maxTokens": 500 },
"response_format": { "type": "json_object" }
}
JSON Schema (Strict):
{
"model": "openai.gpt-4o",
"prompt": "Extract person data from: John Doe, 30 years old, john@email.com",
"options": { "temperature": 0.1, "maxTokens": 200 },
"context": { "task": "extract_structure" },
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "person_data",
"strict": true,
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer" },
"email": { "type": "string" }
},
"required": ["name", "age", "email"],
"additionalProperties": false
}
}
}
}
Provider Support:
| Provider | json_object | json_schema | Notes |
|---|---|---|---|
| OpenAI | Yes | Yes | Full support (strict mode) |
| Grok | Yes | Yes | OpenAI-compatible API |
| Yes | Yes | Translated to responseMimeType/responseSchema | |
| Anthropic | No | No | Uses tool_use for structured output |
response_format is not supported in Agent Mode (/api/v2/agents/:id/stream).
The Agent Mode requires the LLM to freely call tools, ask clarifications, and decide next steps. Forcing response_format would break the agentic cycle. For structured responses, use /api/v3/llm/call directly with response_format.
Stop Sequences
Stop sequences cause the model to stop generating text when it encounters any of the specified strings. The stop sequence itself is not included in the response, and finish_reason returns "stop".
Example:
{
"model": "openai.gpt-4.1-nano",
"prompt": "List numbers 1 to 20, one per line. Write DONE after number 5.",
"options": {
"maxTokens": 500,
"stop": ["DONE"]
}
}
The model will generate 1\n2\n3\n4\n5\n and stop before emitting DONE.
Provider mapping:
The API normalizes the stop parameter to each provider's native format automatically:
| Provider | Native Field | Notes |
|---|---|---|
| OpenAI | stop | Up to 4 sequences |
| Anthropic | stop_sequences | Array of strings |
generationConfig.stopSequences | Array of strings | |
| Grok | stop | OpenAI-compatible |
Reasoning models (OpenAI o-series, GPT-5, Grok reasoning variants) do not support stop sequences. The API silently skips the parameter for these models to avoid errors. If you need stop sequences, use a standard (non-reasoning) model.
Multimodal Content (Images)
Send images alongside text using the OpenAI image_url format. The API automatically converts to each provider's native format.
Example with URL:
{
"model": "openai.gpt-4o",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}],
"options": { "maxTokens": 500 }
}
Example with Base64:
{
"model": "anthropic.claude-haiku-4-5-20251001",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What do you see in this image?"},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQ..."}}
]
}],
"options": { "maxTokens": 500 }
}
Provider Support:
| Provider | Support | Native Format | Auto Conversion |
|---|---|---|---|
| OpenAI | Yes | image_url (native) | Not needed |
| Anthropic | Yes | image with source.type: url/base64 | Yes |
| Yes | inlineData (base64 only) | Yes (downloads URL) | |
| Grok | Yes* | image_url (OpenAI-compatible) | Not needed |
* Support depends on the specific model (e.g. grok-2-vision-1212)
When using model: "auto" with multimodal content, the router automatically selects a vision-capable model.
JPEG, PNG, GIF, WebP — via public URLs (https://) or data URIs (base64).
POST /api/v3/llm/test-connection
Test connection to a specific provider.
Authentication: JWT + Multi-tenant headers required
Request Body:
{
"provider": "openai",
"model": "gpt-4o"
}
Response:
{
"success": true,
"provider": "openai",
"latency_ms": 250,
"status": "connected"
}