Skip to main content

Media Processing

Standalone endpoints for image analysis, audio transcription, and document parsing. These are separate from /api/v3/llm/call (which accepts inline multimodal content) — use these when you need to process media in isolation.

Authentication

Each endpoint has two versions:

VersionPath PatternAuthUse Case
JWT/image/analyzeAuthorization: Bearer <jwt>Frontend
API Key/image/public/analyzeX-Api-Key: YOUR_API_KEYn8n, webhooks, integrations

Image Analysis

POST /api/v3/media/image/analyze

POST /api/v3/media/image/public/analyze

Analyze images using Vision AI (GPT-4o or Claude).

Request Body:

{
"image": "https://example.com/photo.jpg",
"prompt": "Describe this image",
"extract_text": false,
"detail": "auto",
"provider": "auto"
}
FieldTypeRequiredDescription
imagestringYesPublic URL or base64 data URI
promptstringNoAnalysis instruction. Default: "Describe this image in detail."
extract_textbooleanNoOCR mode for text extraction. Default: false
detailstringNoDetail level: low, high, auto. Default: auto
providerstringNoProvider: auto, openai, anthropic. Default: auto

Response:

{
"success": true,
"data": {
"analysis": "The image shows an orange cat lying on a blue couch...",
"extracted_text": null,
"provider": "openai",
"model": "gpt-4o"
},
"execution_time_ms": 2450
}

Audio Transcription

POST /api/v3/media/audio/transcribe

POST /api/v3/media/audio/public/transcribe

Transcribe audio using Whisper.

Request Body:

{
"audio": "https://example.com/recording.mp3",
"language": "pt",
"include_timestamps": false,
"prompt_hint": "Meeting about project budget"
}
FieldTypeRequiredDescription
audiostringYesPublic URL or base64 data URI
languagestringNoISO language code. Default: pt
include_timestampsbooleanNoInclude timestamps per segment. Default: false
prompt_hintstringNoContext to improve transcription accuracy

Supported formats: mp3, wav, ogg, flac, m4a, webm, mp4, mpeg

Response:

{
"success": true,
"data": {
"transcription": "Good morning everyone, let's start the meeting...",
"language": "pt",
"duration_seconds": 125.5,
"duration_formatted": "2:05",
"segments": null
},
"execution_time_ms": 8200
}

With timestamps:

{
"success": true,
"data": {
"transcription": "Hello, welcome to the podcast...",
"segments": [
{ "start": 0.0, "end": 2.5, "text": "Hello, welcome to the podcast." },
{ "start": 2.5, "end": 5.8, "text": "Today we'll talk about..." }
]
}
}

Document Parsing

POST /api/v3/media/document/parse

POST /api/v3/media/document/public/parse

Extract content from documents (PDF, Excel, CSV, DOCX).

Request Body:

{
"document": "https://example.com/report.pdf",
"file_type": "auto",
"extract_mode": "all",
"max_pages": 10
}
FieldTypeRequiredDescription
documentstringYesPublic URL or base64 data URI
file_typestringNoauto, pdf, xlsx, csv, docx. Default: auto
extract_modestringNotext, tables, all. Default: text
sheet_namestringNoSheet name (Excel). Default: first sheet
max_pagesnumberNoPage limit (PDF). Default: no limit

Supported formats: pdf, xlsx, xls, csv, docx, txt

Response (PDF):

{
"success": true,
"data": {
"content": "MONTHLY REPORT\n\n1. Introduction\nThis report presents...",
"file_type": "pdf",
"tables": null,
"page_count": 5,
"info": {
"title": "Monthly Report",
"author": "John Doe"
}
},
"execution_time_ms": 1200
}

Response (Excel with tables):

{
"success": true,
"data": {
"content": null,
"file_type": "xlsx",
"tables": [
{
"name": "January",
"headers": ["Product", "Quantity", "Value"],
"rows": [
["Product A", 100, 1500.00],
["Product B", 50, 750.00]
]
}
],
"page_count": null
}
}

Limits

ResourceLimit
Max image size20 MB
Max audio size25 MB
Max document size50 MB
Processing timeout120 seconds
Rate limit (API Key)Configurable per key

Error Codes

CodeDescription
400Missing required field or invalid format
401Invalid API Key or expired JWT
403Quota exceeded or access denied
413File too large
415Unsupported file format
500Internal processing error
504Processing timeout