Video Generation

Kyma serves ten video-generation models behind a single async endpoint, billed per second of generated footage (or flat per-call for the Hailuo family). Pick the right model for the look you need; the API shape is identical across all ten.

curl -X POST https://kymaapi.com/v1/videos/generations \
  -H "Authorization: Bearer $KYMA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kling-3-pro",
    "prompt": "a cinematic close-up of an hourglass on a wooden desk, golden hour, slow camera push-in",
    "duration": 5
  }'

The endpoint is asynchronous. POST returns 202 with a job_id immediately; poll GET /v1/jobs/{job_id} until status is succeeded.

Pick a model

Model	Best for	Cost / sec	Default 5s clip	Audio	Input
`kling-2.5-pro`	Budget cinematic clips, b-roll	$0.0945	$0.4725	—	text + image
`kling-3-pro`	Premium cinematic, hero brand video	$0.1512	$0.7560	—	text + image
`kling-3-pro-audio`	Cinematic w/ diegetic sound, talking heads	$0.2268	$1.1340	native	text + image
`seedance-2-pro`	Action, multi-shot, social w/ synced audio	$0.40959	$2.04795	bundled	text + image
`seedance-2-fast`	Social shorts, rapid iteration, UI motion	$0.326565	$1.63283	bundled	text + image
`veo-3-fast`	Google Veo budget — 720p, no audio	$0.135 /s	$0.54 (4s)	—	text + image
`veo-3`	Google Veo flagship — 1080p + native audio	$0.540 /s	$2.16 (4s)	native	text + image
`hailuo-02-512p`	Cheapest Hailuo, I2V only	$0.140 flat	n/a (6/10s only)	—	image
`hailuo-02-768p`	Hailuo balanced	$0.420 flat	n/a (6/10s only)	—	text + image
`hailuo-02-1080p`	Hailuo top tier	$0.780 flat	n/a (6s only)	—	text + image

Prices reflect Kyma’s 1.35× markup on the underlying provider cost. Per-second models default to 5s; max 10–15s (varies by model). Hailuo bills flat per call regardless of duration. All models accept image_url for image-to-video (Hailuo 512p is I2V-only). Live canonical source: GET /v1/pricing.

kling-2.5-pro

Cheapest cinematic tier. Photoreal humans, smooth motion, 5–10s clips. The right pick when you need a lot of cinematic b-roll without paying flagship prices.

{
  "model": "kling-2.5-pro",
  "prompt": "a slow dolly shot through an empty modernist library at dawn",
  "duration": 5
}

Cost: $0.0945 / sec ($ 0.4725 for 5s)
Modes: text-to-video (default), image-to-video (pass image_url)
Best for: brand b-roll, character shots on a budget

kling-3-pro

Flagship Kling. Sharper than 2.5 Pro, photoreal humans, smooth motion. Use this for hero shots and premium brand video where the quality needs to stand up at full screen.

{
  "model": "kling-3-pro",
  "prompt": "an architectural fly-through of a glass-and-steel tower at sunset, cinematic anamorphic look",
  "duration": 5
}

Cost: $0.1512 / sec ($ 0.7560 for 5s)
Modes: text-to-video, image-to-video
Best for: hero brand video, character/face shots, premium cinematic

For native audio (ambient + dialogue), use kling-3-pro-audio instead.

kling-3-pro-audio

Kling 3 Pro with native audio. Same visuals as kling-3-pro plus synchronized ambient sound and dialogue. About 50% more expensive per second for the audio track.

{
  "model": "kling-3-pro-audio",
  "prompt": "a barista pulling an espresso shot, the machine hisses, ambient cafe murmur",
  "duration": 5
}

Cost: $0.2268 / sec ($ 1.1340 for 5s)
Audio: native (ambient + dialogue baked into the video)
Best for: talking-head shots, atmospheric scenes, anything that needs diegetic sound

seedance-2-pro

ByteDance flagship. Multi-shot composition, dynamic camera moves, native audio bundled. 720p output. Best when motion and energy matter — action, product demos, fast-cut social.

{
  "model": "seedance-2-pro",
  "prompt": "a runner sprints on a rooftop at golden hour, tracking shot, camera matches their stride",
  "duration": 5
}

Cost: $0.40959 / sec ($ 2.04795 for 5s)
Resolution: 720p
Audio: bundled
Best for: action, multi-shot scenes, social with synced audio, product motion

seedance-2-fast

Seedance 2 fast tier. Quicker generation, ~20% cheaper than seedance-2-pro. Same family behavior, native audio bundled. Right for rapid iteration and short social clips where turn-around beats absolute fidelity.

{
  "model": "seedance-2-fast",
  "prompt": "a phone notification slides in from the right, soft chime, clean UI background",
  "duration": 5
}

Cost: $0.326565 / sec ($ 1.63283 for 5s)
Resolution: 720p
Audio: bundled
Best for: social shorts, UI motion, rapid iteration, product demos

Image-to-video (I2V)

Every model above accepts an image_url. When present, Kyma routes the request to the model’s image-to-video variant — the image becomes the first frame and the prompt drives the motion.

{
  "model": "kling-3-pro",
  "prompt": "the camera slowly pushes in, the subject blinks once",
  "image_url": "https://example.com/portrait.jpg",
  "duration": 5
}

Billing flow

POST creates a job and places a hold for estimated_cost (per-second rate × duration, markup applied).
On succeeded, the hold is finalized as a usage transaction at the actual cost.
On failed or expired, the hold is fully refunded — you only pay for clips you receive.

You can verify the charge on GET /v1/jobs/{id}: charged_amount is the final billed amount, estimated_cost is what was held up front.

Idempotency

Pass idempotency_key to make POST safe to retry. The same (api_key, idempotency_key) pair always returns the same job — no duplicate charges, no duplicate generations.

{
  "model": "seedance-2-fast",
  "prompt": "...",
  "duration": 5,
  "idempotency_key": "campaign-spring-asset-7"
}

Models

Pick a model

kling-2.5-pro

kling-3-pro

kling-3-pro-audio

seedance-2-pro

seedance-2-fast

Image-to-video (I2V)

Billing flow

Idempotency

See also

​Pick a model

​kling-2.5-pro

​kling-3-pro

​kling-3-pro-audio

​seedance-2-pro

​seedance-2-fast

​Image-to-video (I2V)

​Billing flow

​Idempotency

​See also

Pick a model

kling-2.5-pro

kling-3-pro

kling-3-pro-audio

seedance-2-pro

seedance-2-fast

Image-to-video (I2V)

Billing flow

Idempotency

See also