Skip to main content

Overview

kling-3-pro-audio is kling-3-pro with native audio. Same photoreal humans, same smooth motion, same sharp output — plus synchronized ambient sound and dialogue. About 50% more expensive per second than the silent variant.

Specs

FieldValue
Model IDkling-3-pro-audio
CreatorKuaishou
Best forCinematic clips needing diegetic sound, talking-head shots, ambient atmosphere
Default duration5 seconds
Max duration10 seconds
Input modalitiesText, image (I2V)
Output modalitiesVideo, audio
Resolution1080p
AudioNative (ambient + dialogue, baked into the video)
Pricing modePer second

Pricing

Cost
Per second$0.2268
Default 5s clip$1.1340

Use this when

  • The clip needs diegetic sound — espresso machine hiss, footsteps, dialogue, ambient cafe murmur.
  • You’re producing talking-head shots or atmospheric scenes.
  • You’d otherwise have to add Foley in post.

Pick something else when

Example

curl -X POST https://kymaapi.com/v1/videos/generations \
  -H "Authorization: Bearer $KYMA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kling-3-pro-audio",
    "prompt": "a barista pulling an espresso shot, the machine hisses, ambient cafe murmur",
    "duration": 5
  }'
The endpoint is async — POST returns 202 with a job_id; poll GET /v1/jobs/{id} until status is succeeded.

See also