Skip to main content
Kyma serves five video-generation models behind a single async endpoint, billed per second of generated footage (not per token). Pick the right model for the look you need; the API shape is identical across all five.
curl -X POST https://kymaapi.com/v1/videos/generations \
  -H "Authorization: Bearer $KYMA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kling-3-pro",
    "prompt": "a cinematic close-up of an hourglass on a wooden desk, golden hour, slow camera push-in",
    "duration": 5
  }'
The endpoint is asynchronous. POST returns 202 with a job_id immediately; poll GET /v1/jobs/{job_id} until status is succeeded.

Pick a model

ModelBest forCost / secDefault 5s clipAudioInput
kling-2.5-proBudget cinematic clips, b-roll$0.0945$0.4725text + image
kling-3-proPremium cinematic, hero brand video$0.1512$0.7560text + image
kling-3-pro-audioCinematic w/ diegetic sound, talking heads$0.2268$1.1340nativetext + image
seedance-2-proAction, multi-shot, social w/ synced audio$0.40959$2.04795bundledtext + image
seedance-2-fastSocial shorts, rapid iteration, UI motion$0.326565$1.63283bundledtext + image
Prices reflect Kyma’s 1.35× markup on the underlying provider cost. Default duration is 5 seconds; max is 10 seconds. All five models accept image_url to switch from text-to-video into image-to-video.

kling-2.5-pro

Cheapest cinematic tier. Photoreal humans, smooth motion, 5–10s clips. The right pick when you need a lot of cinematic b-roll without paying flagship prices.
{
  "model": "kling-2.5-pro",
  "prompt": "a slow dolly shot through an empty modernist library at dawn",
  "duration": 5
}
  • Cost: 0.0945/sec(0.0945 / sec (0.4725 for 5s)
  • Modes: text-to-video (default), image-to-video (pass image_url)
  • Best for: brand b-roll, character shots on a budget

kling-3-pro

Flagship Kling. Sharper than 2.5 Pro, photoreal humans, smooth motion. Use this for hero shots and premium brand video where the quality needs to stand up at full screen.
{
  "model": "kling-3-pro",
  "prompt": "an architectural fly-through of a glass-and-steel tower at sunset, cinematic anamorphic look",
  "duration": 5
}
  • Cost: 0.1512/sec(0.1512 / sec (0.7560 for 5s)
  • Modes: text-to-video, image-to-video
  • Best for: hero brand video, character/face shots, premium cinematic
For native audio (ambient + dialogue), use kling-3-pro-audio instead.

kling-3-pro-audio

Kling 3 Pro with native audio. Same visuals as kling-3-pro plus synchronized ambient sound and dialogue. About 50% more expensive per second for the audio track.
{
  "model": "kling-3-pro-audio",
  "prompt": "a barista pulling an espresso shot, the machine hisses, ambient cafe murmur",
  "duration": 5
}
  • Cost: 0.2268/sec(0.2268 / sec (1.1340 for 5s)
  • Audio: native (ambient + dialogue baked into the video)
  • Best for: talking-head shots, atmospheric scenes, anything that needs diegetic sound

seedance-2-pro

ByteDance flagship. Multi-shot composition, dynamic camera moves, native audio bundled. 720p output. Best when motion and energy matter — action, product demos, fast-cut social.
{
  "model": "seedance-2-pro",
  "prompt": "a runner sprints on a rooftop at golden hour, tracking shot, camera matches their stride",
  "duration": 5
}
  • Cost: 0.40959/sec(0.40959 / sec (2.04795 for 5s)
  • Resolution: 720p
  • Audio: bundled
  • Best for: action, multi-shot scenes, social with synced audio, product motion

seedance-2-fast

Seedance 2 fast tier. Quicker generation, ~20% cheaper than seedance-2-pro. Same family behavior, native audio bundled. Right for rapid iteration and short social clips where turn-around beats absolute fidelity.
{
  "model": "seedance-2-fast",
  "prompt": "a phone notification slides in from the right, soft chime, clean UI background",
  "duration": 5
}
  • Cost: 0.326565/sec(0.326565 / sec (1.63283 for 5s)
  • Resolution: 720p
  • Audio: bundled
  • Best for: social shorts, UI motion, rapid iteration, product demos

Image-to-video (I2V)

Every model above accepts an image_url. When present, Kyma routes the request to the model’s image-to-video variant — the image becomes the first frame and the prompt drives the motion.
{
  "model": "kling-3-pro",
  "prompt": "the camera slowly pushes in, the subject blinks once",
  "image_url": "https://example.com/portrait.jpg",
  "duration": 5
}

Billing flow

  1. POST creates a job and places a hold for estimated_cost (per-second rate × duration, markup applied).
  2. On succeeded, the hold is finalized as a usage transaction at the actual cost.
  3. On failed or expired, the hold is fully refunded — you only pay for clips you receive.
You can verify the charge on GET /v1/jobs/{id}: charged_amount is the final billed amount, estimated_cost is what was held up front.

Idempotency

Pass idempotency_key to make POST safe to retry. The same (api_key, idempotency_key) pair always returns the same job — no duplicate charges, no duplicate generations.
{
  "model": "seedance-2-fast",
  "prompt": "...",
  "duration": 5,
  "idempotency_key": "campaign-spring-asset-7"
}

See also