Media Processing
Standalone endpoints for image analysis, audio transcription, and document parsing. These are separate from /api/v3/llm/call (which accepts inline multimodal content) — use these when you need to process media in isolation.
Authentication
Each endpoint has two versions:
| Version | Path Pattern | Auth | Use Case |
|---|---|---|---|
| JWT | /image/analyze | Authorization: Bearer <jwt> | Frontend |
| API Key | /image/public/analyze | X-Api-Key: YOUR_API_KEY | n8n, webhooks, integrations |
Image Analysis
POST /api/v3/media/image/analyze
POST /api/v3/media/image/public/analyze
Analyze images using Vision AI (GPT-4o or Claude).
Request Body:
{
"image": "https://example.com/photo.jpg",
"prompt": "Describe this image",
"extract_text": false,
"detail": "auto",
"provider": "auto"
}
| Field | Type | Required | Description |
|---|---|---|---|
image | string | Yes | Public URL or base64 data URI |
prompt | string | No | Analysis instruction. Default: "Describe this image in detail." |
extract_text | boolean | No | OCR mode for text extraction. Default: false |
detail | string | No | Detail level: low, high, auto. Default: auto |
provider | string | No | Provider: auto, openai, anthropic. Default: auto |
Response:
{
"success": true,
"data": {
"analysis": "The image shows an orange cat lying on a blue couch...",
"extracted_text": null,
"provider": "openai",
"model": "gpt-4o"
},
"execution_time_ms": 2450
}
Audio Transcription
POST /api/v3/media/audio/transcribe
POST /api/v3/media/audio/public/transcribe
Transcribe audio using Whisper.
Request Body:
{
"audio": "https://example.com/recording.mp3",
"language": "pt",
"include_timestamps": false,
"prompt_hint": "Meeting about project budget"
}
| Field | Type | Required | Description |
|---|---|---|---|
audio | string | Yes | Public URL or base64 data URI |
language | string | No | ISO language code. Default: pt |
include_timestamps | boolean | No | Include timestamps per segment. Default: false |
prompt_hint | string | No | Context to improve transcription accuracy |
Supported formats: mp3, wav, ogg, flac, m4a, webm, mp4, mpeg
Response:
{
"success": true,
"data": {
"transcription": "Good morning everyone, let's start the meeting...",
"language": "pt",
"duration_seconds": 125.5,
"duration_formatted": "2:05",
"segments": null
},
"execution_time_ms": 8200
}
With timestamps:
{
"success": true,
"data": {
"transcription": "Hello, welcome to the podcast...",
"segments": [
{ "start": 0.0, "end": 2.5, "text": "Hello, welcome to the podcast." },
{ "start": 2.5, "end": 5.8, "text": "Today we'll talk about..." }
]
}
}
Document Parsing
POST /api/v3/media/document/parse
POST /api/v3/media/document/public/parse
Extract content from documents (PDF, Excel, CSV, DOCX).
Request Body:
{
"document": "https://example.com/report.pdf",
"file_type": "auto",
"extract_mode": "all",
"max_pages": 10
}
| Field | Type | Required | Description |
|---|---|---|---|
document | string | Yes | Public URL or base64 data URI |
file_type | string | No | auto, pdf, xlsx, csv, docx. Default: auto |
extract_mode | string | No | text, tables, all. Default: text |
sheet_name | string | No | Sheet name (Excel). Default: first sheet |
max_pages | number | No | Page limit (PDF). Default: no limit |
Supported formats: pdf, xlsx, xls, csv, docx, txt
Response (PDF):
{
"success": true,
"data": {
"content": "MONTHLY REPORT\n\n1. Introduction\nThis report presents...",
"file_type": "pdf",
"tables": null,
"page_count": 5,
"info": {
"title": "Monthly Report",
"author": "John Doe"
}
},
"execution_time_ms": 1200
}
Response (Excel with tables):
{
"success": true,
"data": {
"content": null,
"file_type": "xlsx",
"tables": [
{
"name": "January",
"headers": ["Product", "Quantity", "Value"],
"rows": [
["Product A", 100, 1500.00],
["Product B", 50, 750.00]
]
}
],
"page_count": null
}
}
Limits
| Resource | Limit |
|---|---|
| Max image size | 20 MB |
| Max audio size | 25 MB |
| Max document size | 50 MB |
| Processing timeout | 120 seconds |
| Rate limit (API Key) | Configurable per key |
Error Codes
| Code | Description |
|---|---|
| 400 | Missing required field or invalid format |
| 401 | Invalid API Key or expired JWT |
| 403 | Quota exceeded or access denied |
| 413 | File too large |
| 415 | Unsupported file format |
| 500 | Internal processing error |
| 504 | Processing timeout |