Documentation Index
Fetch the complete documentation index at: https://docs.kymaapi.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
whisper-v3-turbo is OpenAI’s Whisper Large v3 Turbo — the fast variant of Whisper Large v3, distilled to skip half the decoder layers without losing meaningful accuracy. On Kyma it serves both the transcribe alias and the whisper-v3-turbo SKU at 228x realtime inference speed.
Right pick for transcripts, voice-agent input, podcast captions, and any pipeline where speed and cost matter more than the last 1% of WER.
Specs
| Field | Value |
|---|---|
| Model ID | whisper-v3-turbo |
| Creator | OpenAI |
| License | MIT (model weights) |
| Best for | Transcription, captions, voice agents |
| Max file size | 25 MB |
| Max duration | ~30 min mono 16kHz mp3 |
| Input modalities | Audio (mp3, wav, m4a, ogg, webm, flac) |
| Output modalities | Text |
| Pricing mode | Per minute |
| Min billable | 1 minute (rounded up) |
Pricing
| Cost | |
|---|---|
| Per minute | $0.0009 |
| 1-hour file | $0.054 |
| 5-second clip | $0.0009 (rounds up to 1 min) |
Use this when
- You need accurate transcripts at ~50x cheaper than full multimodal LLM analysis.
- You’re feeding a voice agent or building real-time captions where end-to-end latency matters.
- You want the OpenAI Whisper API shape with no code changes — just swap the
base_url.
Pick something else when
- You need to know the mood or background sound, not just the words: use
gemini-3-flash-audio. - Your file is longer than ~30 minutes — split the audio first, or wait for the upcoming Files API path.
Example
Python (OpenAI SDK)
Aliases that resolve here
transcribe— auto-tracks the current best ASR model on Kyma. Today that’s this SKU.
whisper-v3-turbo directly. If you want to ride future upgrades automatically, use transcribe.
See also
- Audio - full audio family overview
POST /v1/audio/transcriptions- endpoint referencewatch-cli- open-source CLI built on this endpoint