Audio Transcriptions

Synchronous endpoint. Upload an audio file, get the transcript back in one call. Compatible with the OpenAI Whisper API — drop-in replacement for https://api.openai.com/v1/audio/transcriptions.

curl -X POST https://kymaapi.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $KYMA_API_KEY" \
  -F "file=@meeting.mp3" \
  -F "model=transcribe"

Request

multipart/form-data upload.

file

required

Audio file. Supports mp3, wav, m4a, ogg, webm, flac. Max 25 MB. ~30 minutes of mono 16kHz mp3 fits comfortably.

model

string

default:"transcribe"

Either the alias transcribe (recommended — auto-tracks the current best ASR model) or a pinned SKU like whisper-v3-turbo. See Audio models.

language

string

ISO-639-1 code (e.g. en, vi, ja). Optional — Whisper auto-detects when omitted. Supplying it improves accuracy on short clips.

response_format

string

default:"verbose_json"

One of: json, verbose_json, text, srt, vtt. JSON formats embed a billing block in the response body. text returns the bare transcript and srt / vtt return subtitle files; for those three, billing rides on X-Kyma-* response headers so the body stays a clean transcript or subtitle file.

temperature

number

default:"0"

Sampling temperature 0–1. Default 0 (deterministic).

prompt

string

Optional priming text. Use it to nudge the model toward known proper nouns, acronyms, or domain vocabulary in your audio.

Response

200 OK with the transcript and a Kyma billing block.

{
  "task": "transcribe",
  "language": "English",
  "duration": 5.03,
  "text": "For too long, I have watched mortals suffer.",
  "segments": [
    {
      "id": 0,
      "start": 0,
      "end": 4.74,
      "text": "For too long, I have watched mortals suffer.",
      "tokens": [50365, 1171, 886, 938, 11, 286, 362, 6337, 6599, 1124, 9753, 13, 50602],
      "temperature": 0,
      "avg_logprob": -0.20,
      "compression_ratio": 0.85,
      "no_speech_prob": 0.0
    }
  ],
  "model": "whisper-v3-turbo",
  "billing": {
    "duration_sec": 5.03,
    "billable_minutes": 1,
    "cost_usd": 0.0009,
    "balance_usd": 41.469
  }
}

text

string

The full transcript.

language

string

Detected language (full name, e.g. "English").

duration

number

Audio duration in seconds (decoded from the file, not estimated).

segments

array

Per-segment timestamps and text. Only present when response_format is verbose_json.

model

string

The Kyma model SKU that served the request.

billing.billable_minutes

number

Minutes charged. Audio is billed in 1-minute increments, rounded up.

billing.cost_usd

number

Final cost charged for this request.

billing.balance_usd

number

Remaining balance after this charge.

Non-JSON formats

When response_format is text, srt, or vtt, the body is a plain transcript or subtitle file (no JSON envelope) and billing comes back on response headers:

Header	Meaning
`X-Kyma-Model`	The model SKU that served the request
`X-Kyma-Duration-Sec`	Detected audio duration in seconds
`X-Kyma-Billable-Minutes`	Minutes charged
`X-Kyma-Cost-USD`	Final cost in USD
`X-Kyma-Balance-USD`	Remaining account balance

srt returns a SubRip subtitle file (application/x-subrip; charset=utf-8); vtt returns a WebVTT file (text/vtt; charset=utf-8). Both are built from the same per-segment timestamps verbose_json exposes, so the timing matches across formats.

Pricing

Model	Per minute
`whisper-v3-turbo`	$0.0009

Billed per minute, rounded up (a 5-second clip costs 1 minute =

0.0009). 1-hour file:

0.054.

Errors

Status	`error.code`	When
`400`	`invalid_request`	Missing `file` field, or not multipart/form-data
`400`	`not_a_transcription_model`	`model` is not a transcription SKU
`401`	`auth_error`	Missing or invalid API key
`402`	`billing_error`	Insufficient credits
`404`	`not_enabled`	Audio gate not enabled on this account
`413`	`invalid_request`	File > 25 MB
`502`	`provider_error`	Upstream transcription failed

Examples

Pin a specific model

curl -X POST https://kymaapi.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $KYMA_API_KEY" \
  -F "file=@interview.mp3" \
  -F "model=whisper-v3-turbo" \
  -F "response_format=verbose_json" \
  -F "language=en"

Just the transcript text

curl -X POST https://kymaapi.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $KYMA_API_KEY" \
  -F "file=@clip.mp3" \
  -F "model=transcribe" \
  -F "response_format=text"

Returns the bare transcript without segments or metadata.

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://kymaapi.com/v1",
    api_key="kyma-...",
)

with open("meeting.mp3", "rb") as f:
    result = client.audio.transcriptions.create(
        model="transcribe",
        file=f,
    )

print(result.text)

Endpoints

Audio Transcriptions

Request

Response

Non-JSON formats

Pricing

Errors

Examples

Pin a specific model

Just the transcript text

Python (OpenAI SDK)

See also

Endpoints

Documentation Index

​Request

​Response

​Non-JSON formats

​Pricing

​Errors

​Examples

​Pin a specific model

​Just the transcript text

​Python (OpenAI SDK)

​See also

Request

Response

Non-JSON formats

Pricing

Errors

Examples

Pin a specific model

Just the transcript text

Python (OpenAI SDK)

See also