LLM Endpoints

Endpoints for making LLM calls and managing models.

POST /api/v3/llm/call

Execute an LLM call with full multi-tenant context.

Authentication: JWT required

Headers:

Authorization: Bearer <jwt-token>
x-tenant-id: <uuid>
x-agent-id: <uuid>
Content-Type: application/json

Request:

{
  "query": "What is the capital of France?",
  "model": "auto",
  "provider": "openai",
  "messages": [
    {"role": "user", "content": "Hello"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000
}

Field	Type	Required	Description
`query`	string	Yes	The prompt/question
`model`	string	No	Model ID or "auto"
`provider`	string	No	Provider name
`messages`	array	No	Conversation history
`temperature`	number	No	Response randomness (0-1)
`max_tokens`	number	No	Maximum response length

Response:

{
  "success": true,
  "response": "The capital of France is Paris.",
  "model": "gpt-4o-mini",
  "provider": "openai",
  "usage": {
    "input_tokens": 15,
    "output_tokens": 8,
    "total_tokens": 23
  },
  "cost": 0.00015,
  "latency_ms": 450,
  "routing": {
    "is_auto_routed": true,
    "model_chosen": "openai.gpt-4o-mini",
    "confidence": 0.85
  }
}

POST /api/v3/llm/public/call

Execute an LLM call using API Key authentication.

Authentication: API Key required

Headers:

X-Api-Key: zhn_live_xxxxx
Content-Type: application/json

Request/Response: Same as /api/v3/llm/call

GET /api/llm/models

List all available models.

Authentication: Not required

Cache: 5 minutes

Response:

{
  "success": true,
  "count": 30,
  "models": [
    {
      "id": "anthropic.claude-3-haiku-20240307",
      "name": "Claude 3 Haiku",
      "provider": "anthropic",
      "description": "Fastest and most economical Claude",
      "tier": "economical",
      "context": "200000",
      "pricing": {
        "input": 0.25,
        "output": 1.25
      },
      "capabilities": ["summarization", "classification"],
      "auto_routing_enabled": true
    }
  ]
}

GET /api/llm/provider/:provider

Get information about a specific provider.

Authentication: Not required

Cache: 15 minutes

Parameters:

Param	Type	Values
`provider`	string	openai, anthropic, google, grok, openrouter

Response:

{
  "success": true,
  "provider": "openai",
  "supportedModels": [
    "gpt-5",
    "gpt-5-mini",
    "gpt-4.1"
  ],
  "modelCount": 5
}

GET /api/llm/recommendations/:task

Get model recommendations for a task type.

Authentication: Not required

Cache: 10 minutes

Parameters:

Param	Type	Values
`task`	string	code_generation, summarization, translation, analysis, creative, chat

Response:

{
  "success": true,
  "task": "code_generation",
  "recommendations": [
    {
      "id": "grok.grok-4-1-fast-reasoning",
      "name": "Grok 4.1 Fast Reasoning",
      "provider": "grok",
      "tier": "flagship"
    }
  ],
  "count": 5
}

POST /api/v3/llm/test-connection

Test connection to a provider.

Authentication: JWT required

Request:

{
  "provider": "openai",
  "model": "gpt-4o"
}

Response:

{
  "success": true,
  "provider": "openai",
  "latency_ms": 250,
  "status": "connected"
}

POST /api/v3/llm/call​

POST /api/v3/llm/public/call​

GET /api/llm/models​

GET /api/llm/provider/:provider​

GET /api/llm/recommendations/:task​

POST /api/v3/llm/test-connection​

POST /api/v3/llm/call

POST /api/v3/llm/public/call

GET /api/llm/models

GET /api/llm/provider/:provider

GET /api/llm/recommendations/:task

POST /api/v3/llm/test-connection