Veo 3 - Kyma API

Overview

veo-3 is Google’s flagship Veo 3 tier — 1080p output with native audio: synchronized dialogue, ambient sound, and lip-sync. Best for hero brand video, talking-head shots, premium cinematic content where audio matters. Fast budget tier without audio is veo-3-fast.

Specs

Field	Value
Model ID	`veo-3`
Creator	Google
Best for	Hero brand video, talking-head, premium cinematic with native audio
Resolution	1080p
Audio	Yes — native dialogue + ambient + lip-sync
Aspect ratios	`16:9` (default), `9:16`
Duration	`4`, `6`, or `8` seconds
First-frame I2V	Yes — pass `image_url`
Pricing mode	Per second × duration
Default latency	~60–180s end-to-end (heavier inference than 720p)
Output	Blob-hosted MP4 (Vercel CDN, durable URL)

Pricing

Per second of generated video. List = provider cost × 1.35.

Variant	Provider $/s	Kyma list $/s	8s clip
`veo-3`	$0.40	$0.540	$4.32

Live source: GET https://kymaapi.com/v1/pricing.

Compared to other video models on Kyma

Strength	veo-3	`veo-3-fast`	`kling-3-pro-audio`	`seedance-2-pro`
Native audio	★★★★★	—	★★★★★	★★★★★
Lip-sync quality	★★★★★	n/a	★★★★	★★★★
Cost $/8s	$4.32	$1.08	$1.81	$2.43
Resolution	1080p	720p	configurable	720p
Best for	hero / talking-head	drafts	cinematic + audio	action + audio

Use this when

You need native audio + 1080p + photoreal lip-sync.
The brief is talking-head, hero brand video, premium cinematic.
Per-clip cost is acceptable for the deliverable.

Pick something else when

You don’t need audio → veo-3-fast at 25% the cost.
You need multi-shot action sequences with audio → seedance-2-pro.
Long-form (10s+) generation needed → kling-3-pro-audio accepts up to 15s.
Budget tier with audio → hailuo-02-* flat-per-call pricing.

Example — text-to-video with audio

curl -X POST https://kymaapi.com/v1/videos/generations \
  -H "Authorization: Bearer $KYMA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "veo-3",
    "prompt": "A barista hands a customer a coffee saying \"have a great day\", soft cafe ambience",
    "duration": 8
  }'

Example — image-to-video with audio

curl -X POST https://kymaapi.com/v1/videos/generations \
  -H "Authorization: Bearer $KYMA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "veo-3",
    "prompt": "The subject turns to camera and waves, friendly background music",
    "image_url": "https://example.com/portrait.jpg",
    "duration": 6
  }'

Async — poll GET /v1/jobs/{id} until succeeded (~60–180s). Output is a durable Vercel blob URL.

​Overview

​Specs

​Pricing

​Compared to other video models on Kyma

​Use this when

​Pick something else when

​Example — text-to-video with audio

​Example — image-to-video with audio