Media Processing

Standalone endpoints for image analysis, audio transcription, and document parsing. These are separate from /api/v3/llm/call (which accepts inline multimodal content) — use these when you need to process media in isolation.

Authentication

Each endpoint has two versions:

Version	Path Pattern	Auth	Use Case
JWT	`/image/analyze`	`Authorization: Bearer <jwt>`	Frontend
API Key	`/image/public/analyze`	`X-Api-Key: YOUR_API_KEY`	n8n, webhooks, integrations

Image Analysis

POST /api/v3/media/image/analyze

POST /api/v3/media/image/public/analyze

Analyze images using Vision AI (GPT-4o or Claude).

Request Body:

{
  "image": "https://example.com/photo.jpg",
  "prompt": "Describe this image",
  "extract_text": false,
  "detail": "auto",
  "provider": "auto"
}

Field	Type	Required	Description
`image`	string	Yes	Public URL or base64 data URI
`prompt`	string	No	Analysis instruction. Default: "Describe this image in detail."
`extract_text`	boolean	No	OCR mode for text extraction. Default: false
`detail`	string	No	Detail level: `low`, `high`, `auto`. Default: auto
`provider`	string	No	Provider: `auto`, `openai`, `anthropic`. Default: auto

Response:

{
  "success": true,
  "data": {
    "analysis": "The image shows an orange cat lying on a blue couch...",
    "extracted_text": null,
    "provider": "openai",
    "model": "gpt-4o"
  },
  "execution_time_ms": 2450
}

Audio Transcription

POST /api/v3/media/audio/transcribe

POST /api/v3/media/audio/public/transcribe

Transcribe audio using Whisper.

Request Body:

{
  "audio": "https://example.com/recording.mp3",
  "language": "pt",
  "include_timestamps": false,
  "prompt_hint": "Meeting about project budget"
}

Field	Type	Required	Description
`audio`	string	Yes	Public URL or base64 data URI
`language`	string	No	ISO language code. Default: pt
`include_timestamps`	boolean	No	Include timestamps per segment. Default: false
`prompt_hint`	string	No	Context to improve transcription accuracy

Supported formats: mp3, wav, ogg, flac, m4a, webm, mp4, mpeg

Response:

{
  "success": true,
  "data": {
    "transcription": "Good morning everyone, let's start the meeting...",
    "language": "pt",
    "duration_seconds": 125.5,
    "duration_formatted": "2:05",
    "segments": null
  },
  "execution_time_ms": 8200
}

With timestamps:

{
  "success": true,
  "data": {
    "transcription": "Hello, welcome to the podcast...",
    "segments": [
      { "start": 0.0, "end": 2.5, "text": "Hello, welcome to the podcast." },
      { "start": 2.5, "end": 5.8, "text": "Today we'll talk about..." }
    ]
  }
}

Document Parsing

POST /api/v3/media/document/parse

POST /api/v3/media/document/public/parse

Extract content from documents (PDF, Excel, CSV, DOCX).

Request Body:

{
  "document": "https://example.com/report.pdf",
  "file_type": "auto",
  "extract_mode": "all",
  "max_pages": 10
}

Field	Type	Required	Description
`document`	string	Yes	Public URL or base64 data URI
`file_type`	string	No	`auto`, `pdf`, `xlsx`, `csv`, `docx`. Default: auto
`extract_mode`	string	No	`text`, `tables`, `all`. Default: text
`sheet_name`	string	No	Sheet name (Excel). Default: first sheet
`max_pages`	number	No	Page limit (PDF). Default: no limit

Supported formats: pdf, xlsx, xls, csv, docx, txt

Response (PDF):

{
  "success": true,
  "data": {
    "content": "MONTHLY REPORT\n\n1. Introduction\nThis report presents...",
    "file_type": "pdf",
    "tables": null,
    "page_count": 5,
    "info": {
      "title": "Monthly Report",
      "author": "John Doe"
    }
  },
  "execution_time_ms": 1200
}

Response (Excel with tables):

{
  "success": true,
  "data": {
    "content": null,
    "file_type": "xlsx",
    "tables": [
      {
        "name": "January",
        "headers": ["Product", "Quantity", "Value"],
        "rows": [
          ["Product A", 100, 1500.00],
          ["Product B", 50, 750.00]
        ]
      }
    ],
    "page_count": null
  }
}

Limits

Resource	Limit
Max image size	20 MB
Max audio size	25 MB
Max document size	50 MB
Processing timeout	120 seconds
Rate limit (API Key)	Configurable per key

Error Codes

Code	Description
400	Missing required field or invalid format
401	Invalid API Key or expired JWT
403	Quota exceeded or access denied
413	File too large
415	Unsupported file format
500	Internal processing error
504	Processing timeout

Authentication​

Image Analysis​

POST /api/v3/media/image/analyze​

POST /api/v3/media/image/public/analyze​

Audio Transcription​

POST /api/v3/media/audio/transcribe​

POST /api/v3/media/audio/public/transcribe​

Document Parsing​

POST /api/v3/media/document/parse​

POST /api/v3/media/document/public/parse​

Limits​

Error Codes​

Authentication

Image Analysis

POST /api/v3/media/image/analyze

POST /api/v3/media/image/public/analyze

Audio Transcription

POST /api/v3/media/audio/transcribe

POST /api/v3/media/audio/public/transcribe

Document Parsing

POST /api/v3/media/document/parse

POST /api/v3/media/document/public/parse

Limits

Error Codes