Pricing

How pricing works

Kyma charges per token. Every request has:

input tokens for your prompt and context
output tokens for the model’s response

No monthly fee, no seat fee, no contract. Prices below are per 1 million tokens. For the live canonical table, use GET /v1/models or GET /v1/credits/pricing.

Pricing per model

Cheapest useful models

Use these for bulk automation, extraction, and simple workloads.

Model	Input / 1M	Output / 1M	Best for
`glm-4.7-flash`	$0.08	$0.54	Cheap long-context throughput
`gemma-4-31b`	$0.19	$0.54	Cheap multimodal / vision
`deepseek-v4-flash`	$0.19 ·$ 0.019 cached	$0.38	Best value V4 — 1M context, native reasoning
`gpt-oss-120b`	$0.20	$0.81	Cheap writing + general tasks
`glm-4.5-air`	$0.18	$1.15	Cheap agentic bulk work
`step-3.7-flash`	$0.27	$1.55	Cheap multimodal flash — text+image+video in, tool calling

Balanced value

Use these when you want strong quality without flagship pricing.

Model	Input / 1M	Output / 1M	Best for
`qwen-3-32b`	$0.39	$0.81	Fast coding loops
`minimax-m2.5`	$0.41	$1.62	Agentic coding
`minimax-m2.7`	$0.41	$1.62	Productivity + debugging
`minimax-m3`	$0.41	$1.62	Newest MiniMax — agentic coding, 1M context, multimodal
`deepseek-v3`	$0.81	$2.30	Best value frontier-class model
`llama-3.3-70b`	$1.19	$1.19	Balanced open model

Flagship / premium

Use these when quality or capability matters more than cost. Rows showing · cached print the cached input price inline (10% of normal input — applies to repeated system prompts and tool definitions on caching-enabled models; see prompt caching).

Model	Input / 1M	Output / 1M	Best for
`qwen-3-coder`	$0.68	$2.16	Code-specialized output
`deepseek-r1`	$0.74	$2.90	Deep reasoning
`kimi-k2.6`	$1.28	$5.40	Agentic flagship (newest)
`kimi-k2.7-code`	$1.28	$5.40	Coding specialist — fewer reasoning tokens, 262K context
`kimi-k2.5`	$0.68	$3.78	Agentic flagship (previous-gen)
`qwen-3.6-plus`	$0.68	$4.05	Best default overall
`qwen-3.7-plus`	$0.54	$2.16	Newest Qwen flagship — vision + 1M context
`deepseek-v4-pro`	$2.35 ·$ 0.235 cached	$4.70	Top reasoning, complex coding (1M context)
`claude-opus-4-7`	$6.75 ·$ 0.675 cached	$33.75	Anthropic flagship — top engineering quality
`claude-sonnet-4-6`	$4.05 ·$ 0.405 cached	$20.25	Anthropic balanced — long-running agentic
`claude-haiku-4-5`	$1.35 ·$ 0.135 cached	$6.75	Anthropic fast tier — 200K context
`glm-5.2`	$1.89	$5.94	Newest flagship — 1M context, #1 open-weight coding
`glm-5.1`	$1.89	$5.94	Long-running coding agents (previous-gen)
`nemotron-3-ultra-550b`	$0.68	$3.38	Largest US open-weight — 1M context, fast reasoning

Long-context & multimodal

Use these when context length or image input is the bottleneck.

Model	Input / 1M	Output / 1M	Best for
`gemini-2.5-flash`	$0.41	$3.38	1M context
`gemini-3-flash`	$0.68	$4.05	Newer long-context preview
`gemma-4-31b`	$0.19	$0.54	Vision + text workflows

Web search

Perplexity’s Sonar models run a live web search on every request and return cited answers. They bill two components on the same request: normal token pricing plus a flat web-search fee for the live search.

Model	Input / 1M	Output / 1M	Web-search fee	Best for
`sonar`	$1.35	$1.35	$0.00675 / request	Quick cited answers, current events
`sonar-pro`	$4.05	$20.25	$0.00675 / request	Deep research, longer cited reports

The web-search fee is flat per request — it does not scale with tokens — so a short sonar question is dominated by the search fee, not the tokens. Every response’s usage.cost already includes it. (The GET /v1/pricing catalog lists the per-token prices above; the flat search fee is applied per request at billing time, not as a catalog field.) Standard models have no per-request fee; use Sonar only when you actually need fresh web data.

Image generation

Image models bill per image, not per token. Cost shown is what you pay; the underlying provider price is marked up 1.35×. See the Image Generation guide and the API reference.

Model	Cost / image	Best for
`gpt-image-2`	$0.014 low ·$ 0.081 medium · $0.297 high	OpenAI flagship — text-in-image, multilingual typography
`imagen-4-ultra`	$0.081	Google Imagen 4 premium — print-ready hero, 4K-scale assets
`imagen-4`	$0.054	Google Imagen 4 standard — photoreal portraits + scenes
`imagen-4-fast`	$0.027	Google Imagen 4 fast tier — drafts, social previews
`nano-banana`	$0.046	Google Gemini image-gen — native edit mode (image-in + prompt → image-out)
`nano-banana-3-flash`	$0.046	Same as above, Gemini 3.1 preview tier
`flux-2-pro`	$0.041 (1MP) ·$ 0.101 (4MP)	Photoreal, multi-reference blend (up to 10 sources)
`recraft-v4-pro`	$0.338	4MP print-ready design
`recraft-v4-vector-pro`	$0.405	4MP native SVG, print-ready signage
`recraft-v4-vector`	$0.108	Native SVG output — editable paths and layers
`ideogram-v3`	$0.108	Typography, logos, packaging
`flux-1.1-ultra`	$0.081	Cinematic photo, hero shots, editorial (legacy — prefer flux-2-pro)
`recraft-v4`	$0.054	Design-quality default — #1 HF Arena
`recraft-v3`	$0.054	Legacy — prefer recraft-v4
`flux-kontext-pro`	$0.054	Image edit, inpaint (requires `image_url`)
`minimax-image-01`	$0.005	Sub-cent budget tier, bulk

A request with n: 4 is billed as 4 images. gpt-image-2 requests can opt into quality: low/medium/high per call; the hold books the right tier amount up front so high-quality requests reserve $0.297 (no refund-and-rebill drift). Holds refund in full if generation fails.

Use Which model should I use? to pick by task rather than by price.

What does a typical request cost?

Use case	Model	Tokens (in+out)	Cost
Quick extraction	`glm-4.7-flash`	~500	~$0.0001
Screenshot understanding	`gemma-4-31b`	~1,000	~$0.0003
Code review	`qwen-3-32b`	~3,000	~$0.001
Long document summary	`gemini-2.5-flash`	~10,000	~$0.005
Deep reasoning task	`deepseek-r1`	~5,000	~$0.005
Repo-scale agent step	`glm-5.1`	~8,000	~$0.013

With the $0.50 signup credit, you can make hundreds or thousands of requests depending on model choice.

How billing works

Before each request, Kyma holds an estimate from your balance
After the response completes, Kyma calculates the real token cost
If the real cost is lower, the difference is refunded automatically

You pay for actual usage, not the initial estimate.

Credits

Action	Amount
Signup bonus	+$0.50
Referral reward	+$0.50
Purchase packages	$5 /$ 20 / $100 /$ 500

Credits never expire. Auto top-up is supported in the dashboard.

# Check your balance
curl https://kymaapi.com/v1/credits/balance \
  -H "Authorization: Bearer ky-your-api-key"

# Full pricing catalog (text + image + video + audio)
curl https://kymaapi.com/v1/pricing

Video generation

Per-second models scale with the duration parameter (default 5s, max 10–15s by model). Hailuo bills flat per call.

Model	Cost	Audio	Notes
`kling-2.5-pro`	$0.0945/s	—	Budget cinematic
`veo-3-fast`	$0.135/s	—	Google Veo budget (720p)
`hailuo-02-512p`	$0.140 flat	—	Cheapest video (I2V only)
`kling-3-pro`	$0.1512/s	—	Premium cinematic
`kling-3-pro-audio`	$0.2268/s	native	Cinematic + audio
`seedance-2-fast`	$0.326565/s	bundled	Social shorts
`seedance-2-pro`	$0.40959/s	bundled	Multi-shot action
`hailuo-02-768p`	$0.420 flat	—	Mid-tier Hailuo
`veo-3`	$0.540/s	native	Veo flagship (1080p + audio)
`hailuo-02-1080p`	$0.780 flat	—	Hailuo top tier

See Video Generation. Live source: GET /v1/pricing (video rows).

Audio

Audio splits by capability — transcription, understanding, realtime, TTS, music, voice, SFX. Speech-to-text (per minute, 1-min minimum billable)

Model	Cost	Notes
`whisper-v3-turbo`	$0.0009/min	Whisper v3 Turbo — 228× realtime. Alias `transcribe`
`gpt-4o-mini-transcribe-2025-12-15`	$0.00405/min	OpenAI premium STT — code-switching, conversational. Alias `transcribe-quality`

Audio understanding (per minute)

Model	Cost	Notes
`gemini-3-flash-audio`	$0.0006/min	Tone, music, scene understanding

Realtime translation (per minute)

Model	Cost	Notes
`gpt-realtime-translate`	$0.034/min	OpenAI
`gemini-2.5-flash-native-audio-preview-12-2025`	varies	Live API session
`gemini-3.5-live-translate-preview`	$0.0634/min	Google — speech-to-speech translation, Live API session

TTS (per 1k characters)

Model	Cost	Notes
`minimax-speech-turbo`	$0.090	Lowest-latency voice
`minimax-speech-hd`	$0.140	Production multilingual
`eleven-flash-v2-5`	$0.2025	ElevenLabs fast
`eleven-turbo-v2-5`	$0.2025	ElevenLabs turbo
`eleven-multilingual-v2`	$0.405	ElevenLabs quality
`eleven-v3`	$0.405	ElevenLabs most expressive — audio tags, 70+ languages

Music

Model	Cost	Notes
`minimax-music`	$0.045/song	MiniMax flat per song
`minimax-music-pro`	$0.210/song	Music-2.6 family
`elevenlabs-music`	$0.135/s	Pay per second

Voice services & SFX (flat per call)

Model	Cost	Notes
`minimax-voice-clone`	$2.10/voice	Clone from 10s–5min reference
`minimax-voice-design`	$4.20/voice	Generate voice from text description
`elevenlabs-sfx`	~$0.027/gen	Sound effect (auto duration)

Live source: GET /v1/pricing (audio rows).

Prompt caching

Cached input tokens are charged at 10% of the normal input price. That means:

repeated system prompts get much cheaper
tool definitions become cheaper over repeated runs
long agent sessions benefit the most

Learn more about prompt caching →

Rate limits

Kyma uses a tier-based system similar to xAI and Anthropic-style paid access: your tier increases based on total credits purchased, not lifetime usage.

Tier	Min Purchase	RPM	Per-model RPM	TPM
0 (Free)	$0	30	25	200K
1 (Starter)	$10	60	40	500K
2 (Builder)	$50	120	80	2M
3 (Pro)	$250	200	150	5M
4 (Enterprise)	$1,000	300	200	10M

RPM = requests per minute Per-model RPM = how many requests per minute a single model can take TPM = tokens per minute across your account There are no daily or monthly caps. Your balance and tier are the real limits.

# Check your current limits
curl https://kymaapi.com/v1/auth/limits \
  -H "Authorization: Bearer ky-your-api-key"

What happens when you hit a limit?

Out of credits

{
  "error": {
    "message": "Insufficient credits. Add credits at https://kymaapi.com/dashboard.",
    "type": "insufficient_credits"
  }
}

Too many requests

{
  "error": {
    "message": "Rate limit exceeded (30 RPM for Tier 0). Try again in a few seconds.",
    "type": "rate_limit"
  }
}

The Retry-After header tells you when to retry.

How to spend less

Use cheaper models for extraction, routing, and repetitive automation
Use qwen-3-32b instead of a flagship if you mainly need fast coding help
Use gemini-2.5-flash only when long context is the real need
Use deepseek-r1 only when the task truly needs deeper reasoning
Purchase credits if you need higher rate limits, not just more balance

Getting Started

More

How pricing works

Pricing per model

Cheapest useful models

Balanced value

Flagship / premium

Long-context & multimodal

Web search

Image generation

What does a typical request cost?

How billing works

Credits

Video generation

Audio

Prompt caching

Rate limits

What happens when you hit a limit?

How to spend less

Next steps

​How pricing works

​Pricing per model

​Cheapest useful models

​Balanced value

​Flagship / premium

​Long-context & multimodal

​Web search

​Image generation

​What does a typical request cost?

​How billing works

​Credits

​Video generation

​Audio

​Prompt caching

​Rate limits

​What happens when you hit a limit?

​How to spend less

​Next steps

How pricing works

Pricing per model

Cheapest useful models

Balanced value

Flagship / premium

Long-context & multimodal

Web search

Image generation

What does a typical request cost?

How billing works

Credits

Video generation

Audio

Prompt caching

Rate limits

What happens when you hit a limit?

How to spend less

Next steps