Skip to main content

LLM Calls

Endpoints for making LLM calls with multi-provider support, auto-routing, and structured output.

Authentication

Two authentication methods are available:

MethodHeaderUse Case
JWTAuthorization: Bearer <jwt> + x-tenant-idFrontend / Dashboard
API KeyX-Api-Key: YOUR_API_KEYExternal integrations (n8n, webhooks, MCP)

POST /api/v3/llm/call

Execute an LLM call with full multi-tenant context.

Authentication: JWT + Multi-tenant headers required

Headers:

Authorization: Bearer <jwt-token>
x-tenant-id: <uuid>
x-agent-id: <uuid>
Content-Type: application/json

Request Body:

{
"model": "auto",
"prompt": "What is the capital of France?",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello"}
],
"options": {
"temperature": 0.7,
"maxTokens": 4096,
"stop": ["END"]
},
"context": {
"task": "chat_general"
},
"response_format": { "type": "json_object" }
}
FieldTypeRequiredDescription
modelstringYesModel ID (e.g. openai.gpt-4o) or auto for auto-routing
promptstringYes*The prompt/question (*or messages)
messagesarrayYes*Conversation history (*or prompt)
options.temperaturenumberNoResponse randomness (0-1). Default: 0.7
options.maxTokensnumberNoMaximum response length. Default: 1000
options.stopstring[]NoStop sequences — the model stops generating when it encounters any of these strings (see Stop Sequences below)
context.taskstringNoTask type for auto-routing optimization (see Auto-Routing)
response_formatobjectNoStructured output format (see Structured Output below)
Parameter Format
  • Use options.maxTokens (camelCase), not max_tokens at root level
  • Use context.task, not context.task_type

Response:

{
"success": true,
"response": "The capital of France is Paris.",
"model": "gpt-4.1-nano",
"provider": "openai",
"usage": {
"input_tokens": 15,
"output_tokens": 8,
"total_tokens": 23
},
"cost": 0.00015,
"latency_ms": 450,
"routing": {
"is_auto_routed": true,
"model_chosen": "openai.gpt-4.1-nano",
"confidence": 0.85
}
}

POST /api/v3/llm/public/call

Execute an LLM call using API Key authentication.

Authentication: API Key required

Headers:

X-Api-Key: YOUR_API_KEY
Content-Type: application/json

Request/Response: Same as POST /api/v3/llm/call

Quick Examples

curl -X POST https://llm.zihin.ai/api/v3/llm/public/call \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "Summarize this contract", "model": "auto"}'

Structured Output (response_format)

Force the LLM to return a specific JSON structure.

Simple JSON Object:

{
"model": "openai.gpt-4o",
"prompt": "List 3 countries and their capitals",
"options": { "maxTokens": 500 },
"response_format": { "type": "json_object" }
}

JSON Schema (Strict):

{
"model": "openai.gpt-4o",
"prompt": "Extract person data from: John Doe, 30 years old, john@email.com",
"options": { "temperature": 0.1, "maxTokens": 200 },
"context": { "task": "extract_structure" },
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "person_data",
"strict": true,
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer" },
"email": { "type": "string" }
},
"required": ["name", "age", "email"],
"additionalProperties": false
}
}
}
}

Provider Support:

Providerjson_objectjson_schemaNotes
OpenAIYesYesFull support (strict mode)
GrokYesYesOpenAI-compatible API
GoogleYesYesTranslated to responseMimeType/responseSchema
AnthropicNoNoUses tool_use for structured output
Structured Output and Agent Mode

response_format is not supported in Agent Mode (/api/v2/agents/:id/stream).

The Agent Mode requires the LLM to freely call tools, ask clarifications, and decide next steps. Forcing response_format would break the agentic cycle. For structured responses, use /api/v3/llm/call directly with response_format.


Stop Sequences

Stop sequences cause the model to stop generating text when it encounters any of the specified strings. The stop sequence itself is not included in the response, and finish_reason returns "stop".

Example:

{
"model": "openai.gpt-4.1-nano",
"prompt": "List numbers 1 to 20, one per line. Write DONE after number 5.",
"options": {
"maxTokens": 500,
"stop": ["DONE"]
}
}

The model will generate 1\n2\n3\n4\n5\n and stop before emitting DONE.

Provider mapping:

The API normalizes the stop parameter to each provider's native format automatically:

ProviderNative FieldNotes
OpenAIstopUp to 4 sequences
Anthropicstop_sequencesArray of strings
GooglegenerationConfig.stopSequencesArray of strings
GrokstopOpenAI-compatible
Reasoning Models

Reasoning models (OpenAI o-series, GPT-5, Grok reasoning variants) do not support stop sequences. The API silently skips the parameter for these models to avoid errors. If you need stop sequences, use a standard (non-reasoning) model.


Multimodal Content (Images)

Send images alongside text using the OpenAI image_url format. The API automatically converts to each provider's native format.

Example with URL:

{
"model": "openai.gpt-4o",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}],
"options": { "maxTokens": 500 }
}

Example with Base64:

{
"model": "anthropic.claude-haiku-4-5-20251001",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What do you see in this image?"},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQ..."}}
]
}],
"options": { "maxTokens": 500 }
}

Provider Support:

ProviderSupportNative FormatAuto Conversion
OpenAIYesimage_url (native)Not needed
AnthropicYesimage with source.type: url/base64Yes
GoogleYesinlineData (base64 only)Yes (downloads URL)
GrokYes*image_url (OpenAI-compatible)Not needed

* Support depends on the specific model (e.g. grok-2-vision-1212)

When using model: "auto" with multimodal content, the router automatically selects a vision-capable model.

Supported Image Formats

JPEG, PNG, GIF, WebP — via public URLs (https://) or data URIs (base64).


POST /api/v3/llm/test-connection

Test connection to a specific provider.

Authentication: JWT + Multi-tenant headers required

Request Body:

{
"provider": "openai",
"model": "gpt-4o"
}

Response:

{
"success": true,
"provider": "openai",
"latency_ms": 250,
"status": "connected"
}