LLM Endpoints
Endpoints for making LLM calls and managing models.
POST /api/v3/llm/call
Execute an LLM call with full multi-tenant context.
Authentication: JWT required
Headers:
Authorization: Bearer <jwt-token>
x-tenant-id: <uuid>
x-agent-id: <uuid>
Content-Type: application/json
Request:
{
"query": "What is the capital of France?",
"model": "auto",
"provider": "openai",
"messages": [
{"role": "user", "content": "Hello"}
],
"temperature": 0.7,
"max_tokens": 1000
}
| Field | Type | Required | Description |
|---|---|---|---|
query | string | Yes | The prompt/question |
model | string | No | Model ID or "auto" |
provider | string | No | Provider name |
messages | array | No | Conversation history |
temperature | number | No | Response randomness (0-1) |
max_tokens | number | No | Maximum response length |
Response:
{
"success": true,
"response": "The capital of France is Paris.",
"model": "gpt-4o-mini",
"provider": "openai",
"usage": {
"input_tokens": 15,
"output_tokens": 8,
"total_tokens": 23
},
"cost": 0.00015,
"latency_ms": 450,
"routing": {
"is_auto_routed": true,
"model_chosen": "openai.gpt-4o-mini",
"confidence": 0.85
}
}
POST /api/v3/llm/public/call
Execute an LLM call using API Key authentication.
Authentication: API Key required
Headers:
X-Api-Key: zhn_live_xxxxx
Content-Type: application/json
Request/Response: Same as /api/v3/llm/call
GET /api/llm/models
List all available models.
Authentication: Not required
Cache: 5 minutes
Response:
{
"success": true,
"count": 30,
"models": [
{
"id": "anthropic.claude-3-haiku-20240307",
"name": "Claude 3 Haiku",
"provider": "anthropic",
"description": "Fastest and most economical Claude",
"tier": "economical",
"context": "200000",
"pricing": {
"input": 0.25,
"output": 1.25
},
"capabilities": ["summarization", "classification"],
"auto_routing_enabled": true
}
]
}
GET /api/llm/provider/:provider
Get information about a specific provider.
Authentication: Not required
Cache: 15 minutes
Parameters:
| Param | Type | Values |
|---|---|---|
provider | string | openai, anthropic, google, grok, openrouter |
Response:
{
"success": true,
"provider": "openai",
"supportedModels": [
"gpt-5",
"gpt-5-mini",
"gpt-4.1"
],
"modelCount": 5
}
GET /api/llm/recommendations/:task
Get model recommendations for a task type.
Authentication: Not required
Cache: 10 minutes
Parameters:
| Param | Type | Values |
|---|---|---|
task | string | code_generation, summarization, translation, analysis, creative, chat |
Response:
{
"success": true,
"task": "code_generation",
"recommendations": [
{
"id": "grok.grok-4-1-fast-reasoning",
"name": "Grok 4.1 Fast Reasoning",
"provider": "grok",
"tier": "flagship"
}
],
"count": 5
}
POST /api/v3/llm/test-connection
Test connection to a provider.
Authentication: JWT required
Request:
{
"provider": "openai",
"model": "gpt-4o"
}
Response:
{
"success": true,
"provider": "openai",
"latency_ms": 250,
"status": "connected"
}