Overview
kling-3-pro-audio is kling-3-pro with native audio. Same photoreal humans, same smooth motion, same sharp output — plus synchronized ambient sound and dialogue. About 50% more expensive per second than the silent variant.
Specs
| Field | Value |
|---|---|
| Model ID | kling-3-pro-audio |
| Creator | Kuaishou |
| Best for | Cinematic clips needing diegetic sound, talking-head shots, ambient atmosphere |
| Default duration | 5 seconds |
| Max duration | 10 seconds |
| Input modalities | Text, image (I2V) |
| Output modalities | Video, audio |
| Resolution | 1080p |
| Audio | Native (ambient + dialogue, baked into the video) |
| Pricing mode | Per second |
Pricing
| Cost | |
|---|---|
| Per second | $0.2268 |
| Default 5s clip | $1.1340 |
Use this when
- The clip needs diegetic sound — espresso machine hiss, footsteps, dialogue, ambient cafe murmur.
- You’re producing talking-head shots or atmospheric scenes.
- You’d otherwise have to add Foley in post.
Pick something else when
- The clip is silent or you’ll add audio in post: use
kling-3-proand save ~33% per second. - You want fast-cut social with bundled audio: use
seedance-2-proorseedance-2-fast.
Example
202 with a job_id; poll GET /v1/jobs/{id} until status is succeeded.
See also
- Video Generation — full family overview
kling-3-pro— same visuals without audioPOST /v1/videos/generations— endpoint reference