Documentation Index
Fetch the complete documentation index at: https://docs.kymaapi.com/llms.txt
Use this file to discover all available pages before exploring further.
How pricing works
Kyma charges per token. Every request has:
- input tokens for your prompt and context
- output tokens for the model’s response
No monthly fee, no seat fee, no contract. Prices below are per 1 million tokens.
For the live canonical table, use GET /v1/models or GET /v1/credits/pricing.
Pricing per model
Cheapest useful models
Use these for bulk automation, extraction, and simple workloads.
| Model | Input / 1M | Output / 1M | Best for |
|---|
glm-4.7-flash | $0.08 | $0.54 | Cheap long-context throughput |
gemma-4-31b | $0.19 | $0.54 | Cheap multimodal / vision |
deepseek-v4-flash | 0.19⋅0.019 cached | $0.38 | Best value V4 — 1M context, native reasoning |
gpt-oss-120b | $0.20 | $0.81 | Cheap writing + general tasks |
glm-4.5-air | $0.18 | $1.15 | Cheap agentic bulk work |
Balanced value
Use these when you want strong quality without flagship pricing.
| Model | Input / 1M | Output / 1M | Best for |
|---|
qwen-3-32b | $0.39 | $0.81 | Fast coding loops |
minimax-m2.5 | $0.41 | $1.62 | Agentic coding |
minimax-m2.7 | $0.41 | $1.62 | Productivity + debugging |
deepseek-v3 | $0.81 | $2.30 | Best value frontier-class model |
llama-3.3-70b | $1.19 | $1.19 | Balanced open model |
Flagship / premium
Use these when quality or capability matters more than cost. Rows showing · cached print the cached input price inline (10% of normal input — applies to repeated system prompts and tool definitions on caching-enabled models; see prompt caching).
| Model | Input / 1M | Output / 1M | Best for |
|---|
qwen-3-coder | $0.68 | $2.16 | Code-specialized output |
deepseek-r1 | $0.68 | $2.90 | Deep reasoning |
kimi-k2.6 | $1.28 | $5.40 | Agentic flagship (newest) |
kimi-k2.5 | $0.68 | $3.78 | Agentic flagship (previous-gen) |
qwen-3.6-plus | $0.68 | $4.05 | Best default overall |
deepseek-v4-pro | 2.35⋅0.235 cached | $4.70 | Top reasoning, complex coding (1M context) |
claude-opus-4-7 | 6.75⋅0.675 cached | $33.75 | Anthropic flagship — top engineering quality |
claude-sonnet-4-6 | 4.05⋅0.405 cached | $20.25 | Anthropic balanced — long-running agentic |
claude-haiku-4-5 | 1.35⋅0.135 cached | $6.75 | Anthropic fast tier — 200K context |
glm-5.1 | $1.89 | $5.94 | Long-running coding agents |
Long-context & multimodal
Use these when context length or image input is the bottleneck.
| Model | Input / 1M | Output / 1M | Best for |
|---|
gemini-2.5-flash | $0.41 | $3.38 | 1M context |
gemini-3-flash | $0.68 | $4.05 | Newer long-context preview |
gemma-4-31b | $0.19 | $0.54 | Vision + text workflows |
Web search
Perplexity’s Sonar models run a live web search on every request and return cited answers. They bill two components on the same request: normal token pricing plus a flat web-search fee for the live search.
| Model | Input / 1M | Output / 1M | Web-search fee | Best for |
|---|
sonar | $1.35 | $1.35 | $0.00675 / request | Quick cited answers, current events |
sonar-pro | $4.05 | $20.25 | $0.00675 / request | Deep research, longer cited reports |
The web-search fee is flat per request — it does not scale with tokens — so a short sonar question is dominated by the search fee, not the tokens. Every response’s usage.cost already includes it. (The GET /v1/pricing catalog lists the per-token prices above; the flat search fee is applied per request at billing time, not as a catalog field.) Standard models have no per-request fee; use Sonar only when you actually need fresh web data.
Image generation
Image models bill per image, not per token. Cost shown is what you pay; the underlying provider price is marked up 1.35×. See the Image Generation guide and the API reference.
| Model | Cost / image | Best for |
|---|
gpt-image-2 | 0.014low⋅0.081 medium · $0.297 high | OpenAI flagship — text-in-image, multilingual typography |
imagen-4-ultra | $0.081 | Google Imagen 4 premium — print-ready hero, 4K-scale assets |
imagen-4 | $0.054 | Google Imagen 4 standard — photoreal portraits + scenes |
imagen-4-fast | $0.027 | Google Imagen 4 fast tier — drafts, social previews |
nano-banana | $0.046 | Google Gemini image-gen — native edit mode (image-in + prompt → image-out) |
nano-banana-3-flash | $0.046 | Same as above, Gemini 3.1 preview tier |
flux-2-pro | 0.041(1MP)⋅0.101 (4MP) | Photoreal, multi-reference blend (up to 10 sources) |
recraft-v4-pro | $0.338 | 4MP print-ready design |
recraft-v4-vector-pro | $0.405 | 4MP native SVG, print-ready signage |
recraft-v4-vector | $0.108 | Native SVG output — editable paths and layers |
ideogram-v3 | $0.108 | Typography, logos, packaging |
flux-1.1-ultra | $0.081 | Cinematic photo, hero shots, editorial (legacy — prefer flux-2-pro) |
recraft-v4 | $0.054 | Design-quality default — #1 HF Arena |
recraft-v3 | $0.054 | Legacy — prefer recraft-v4 |
flux-kontext-pro | $0.054 | Image edit, inpaint (requires image_url) |
minimax-image-01 | $0.005 | Sub-cent budget tier, bulk |
A request with n: 4 is billed as 4 images. gpt-image-2 requests can opt into quality: low/medium/high per call; the hold books the right tier amount up front so high-quality requests reserve $0.297 (no refund-and-rebill drift). Holds refund in full if generation fails.
What does a typical request cost?
| Use case | Model | Tokens (in+out) | Cost |
|---|
| Quick extraction | glm-4.7-flash | ~500 | ~$0.0001 |
| Screenshot understanding | gemma-4-31b | ~1,000 | ~$0.0003 |
| Code review | qwen-3-32b | ~3,000 | ~$0.001 |
| Long document summary | gemini-2.5-flash | ~10,000 | ~$0.005 |
| Deep reasoning task | deepseek-r1 | ~5,000 | ~$0.005 |
| Repo-scale agent step | glm-5.1 | ~8,000 | ~$0.013 |
With the $0.50 signup credit, you can make hundreds or thousands of requests depending on model choice.
How billing works
- Before each request, Kyma holds an estimate from your balance
- After the response completes, Kyma calculates the real token cost
- If the real cost is lower, the difference is refunded automatically
You pay for actual usage, not the initial estimate.
Credits
| Action | Amount |
|---|
| Signup bonus | +$0.50 |
| Referral reward | +$0.50 |
| Purchase packages | 5/20 / 100/500 |
Credits never expire. Auto top-up is supported in the dashboard.
# Check your balance
curl https://kymaapi.com/v1/credits/balance \
-H "Authorization: Bearer ky-your-api-key"
# Full pricing catalog (text + image + video + audio)
curl https://kymaapi.com/v1/pricing
Video generation
Per-second models scale with the duration parameter (default 5s, max 10–15s by model). Hailuo bills flat per call.
| Model | Cost | Audio | Notes |
|---|
kling-2.5-pro | $0.0945/s | — | Budget cinematic |
veo-3-fast | $0.135/s | — | Google Veo budget (720p) |
hailuo-02-512p | $0.140 flat | — | Cheapest video (I2V only) |
kling-3-pro | $0.1512/s | — | Premium cinematic |
kling-3-pro-audio | $0.2268/s | native | Cinematic + audio |
seedance-2-fast | $0.326565/s | bundled | Social shorts |
seedance-2-pro | $0.40959/s | bundled | Multi-shot action |
hailuo-02-768p | $0.420 flat | — | Mid-tier Hailuo |
veo-3 | $0.540/s | native | Veo flagship (1080p + audio) |
hailuo-02-1080p | $0.780 flat | — | Hailuo top tier |
See Video Generation. Live source: GET /v1/pricing (video rows).
Audio
Audio splits by capability — transcription, understanding, realtime, TTS, music, voice, SFX.
Speech-to-text (per minute, 1-min minimum billable)
| Model | Cost | Notes |
|---|
whisper-v3-turbo | $0.0009/min | Groq Whisper — 228× realtime. Alias transcribe |
gpt-4o-mini-transcribe-2025-12-15 | $0.00405/min | OpenAI premium STT — code-switching, conversational. Alias transcribe-quality |
Audio understanding (per minute)
| Model | Cost | Notes |
|---|
gemini-3-flash-audio | $0.0006/min | Tone, music, scene understanding |
Realtime translation (per minute)
| Model | Cost | Notes |
|---|
gpt-realtime-translate | $0.034/min | OpenAI |
gemini-2.5-flash-native-audio-preview-12-2025 | varies | Live API session |
TTS (per 1k characters)
| Model | Cost | Notes |
|---|
minimax-speech-turbo | $0.090 | Lowest-latency voice |
minimax-speech-hd | $0.140 | Production multilingual |
eleven-flash-v2-5 | ~$0.0945 | ElevenLabs fast |
eleven-turbo-v2-5 | ~$0.0945 | ElevenLabs turbo |
eleven-multilingual-v2 | ~$0.189 | ElevenLabs quality |
Music
| Model | Cost | Notes |
|---|
minimax-music | $0.045/song | MiniMax flat per song |
minimax-music-pro | $0.210/song | Music-2.6 family |
elevenlabs-music | $0.135/s | Pay per second |
Voice services & SFX (flat per call)
| Model | Cost | Notes |
|---|
minimax-voice-clone | $2.10/voice | Clone from 10s–5min reference |
minimax-voice-design | $4.20/voice | Generate voice from text description |
elevenlabs-sfx | ~$0.027/gen | Sound effect (auto duration) |
Live source: GET /v1/pricing (audio rows).
Prompt caching
Cached input tokens are charged at 10% of the normal input price.
That means:
- repeated system prompts get much cheaper
- tool definitions become cheaper over repeated runs
- long agent sessions benefit the most
Learn more about prompt caching →
Rate limits
Kyma uses a tier-based system similar to xAI and Anthropic-style paid access: your tier increases based on total credits purchased, not lifetime usage.
| Tier | Min Purchase | RPM | Per-model RPM | TPM |
|---|
| 0 (Free) | $0 | 30 | 25 | 200K |
| 1 (Starter) | $10 | 60 | 40 | 500K |
| 2 (Builder) | $50 | 120 | 80 | 2M |
| 3 (Pro) | $250 | 200 | 150 | 5M |
| 4 (Enterprise) | $1,000 | 300 | 200 | 10M |
RPM = requests per minute
Per-model RPM = how many requests per minute a single model can take
TPM = tokens per minute across your account
There are no daily or monthly caps. Your balance and tier are the real limits.
# Check your current limits
curl https://kymaapi.com/v1/auth/limits \
-H "Authorization: Bearer ky-your-api-key"
What happens when you hit a limit?
Out of credits
{
"error": {
"message": "Insufficient credits. Add credits at https://kymaapi.com/dashboard.",
"type": "insufficient_credits"
}
}
Too many requests
{
"error": {
"message": "Rate limit exceeded (30 RPM for Tier 0). Try again in a few seconds.",
"type": "rate_limit"
}
}
The Retry-After header tells you when to retry.
How to spend less
- Use cheaper models for extraction, routing, and repetitive automation
- Use
qwen-3-32b instead of a flagship if you mainly need fast coding help
- Use
gemini-2.5-flash only when long context is the real need
- Use
deepseek-r1 only when the task truly needs deeper reasoning
- Purchase credits if you need higher rate limits, not just more balance
Next steps