Documentation Index
Fetch the complete documentation index at: https://docs.kymaapi.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
gemini-3-flash-audio is Google’s Gemini 3 Flash Preview tuned for audio understanding, not transcription. It listens to a clip and answers free-form questions about how it sounds — speaker emotion, music style, ambient SFX, language, mood, pacing.
It’s the part of audio that a transcript loses. Pair it with whisper-v3-turbo and you have full audio context for any clip.
Specs
| Field | Value |
|---|---|
| Model ID | gemini-3-flash-audio |
| Creator | |
| Best for | Audio scene Q&A, mood/emotion, music recognition |
| Max file size | 25 MB |
| Max duration | ~30 min inline payload |
| Input modalities | Audio |
| Output modalities | Text |
| Pricing mode | Per minute |
| Min billable | 1 minute (rounded up) |
| Release stage | Preview |
Pricing
| Cost | |
|---|---|
| Per minute | $0.000648 |
| 1-hour file | $0.039 |
| 30-second clip | $0.000648 (rounds up to 1 min) |
Use this when
- The question is about how something sounds, not what was said: mood, emotion, music style, ambient SFX.
- You need to know what language is being spoken (and want a reasoned answer, not just a code).
- You’re triaging audio for a video pipeline and want a one-sentence scene description per clip.
Pick something else when
- You only need the words: use
whisper-v3-turbo— it’s cheaper and faster. - You need real-time audio (sub-100ms latency) — Gemini 3 Flash audio is fast but not real-time.
Example
Tips
- Be specific in the question. “What’s the mood?” works, but “Is the speaker frustrated, neutral, or pleased? One word, then a one-sentence justification” gets a more useful answer.
- Pass
duration_secwhen you have it (ffprobegives you it in one line). Saves over-charging on high-bitrate inputs where size-based estimation runs long. - Audio scene first, transcript second. A 1-sentence audio summary often gives an agent enough context to decide whether the transcript is even worth fetching.
Aliases that resolve here
audio-understand— the canonical alias for “ask a question about audio”. Resolves to this SKU today.
gemini-3-flash-audio directly.
See also
- Audio - full audio family overview
POST /v1/audio/understand- endpoint referencewhisper-v3-turbo- the transcription pairwatch-cli- open-source CLI that pairs this with transcribe