Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kymaapi.com/llms.txt

Use this file to discover all available pages before exploring further.

How pricing works

Kyma charges per token. Every request has:
  • input tokens for your prompt and context
  • output tokens for the model’s response
No monthly fee, no seat fee, no contract. Prices below are per 1 million tokens. For the live canonical table, use GET /v1/models or GET /v1/credits/pricing.

Pricing per model

Cheapest useful models

Use these for bulk automation, extraction, and simple workloads.
ModelInput / 1MOutput / 1MBest for
glm-4.7-flash$0.08$0.54Cheap long-context throughput
gemma-4-31b$0.19$0.54Cheap multimodal / vision
deepseek-v4-flash0.190.19 · 0.019 cached$0.38Best value V4 — 1M context, native reasoning
gpt-oss-120b$0.20$0.81Cheap writing + general tasks
glm-4.5-air$0.18$1.15Cheap agentic bulk work

Balanced value

Use these when you want strong quality without flagship pricing.
ModelInput / 1MOutput / 1MBest for
qwen-3-32b$0.39$0.81Fast coding loops
minimax-m2.5$0.41$1.62Agentic coding
minimax-m2.7$0.41$1.62Productivity + debugging
deepseek-v3$0.81$2.30Best value frontier-class model
llama-3.3-70b$1.19$1.19Balanced open model

Flagship / premium

Use these when quality or capability matters more than cost. Rows showing · cached print the cached input price inline (10% of normal input — applies to repeated system prompts and tool definitions on caching-enabled models; see prompt caching).
ModelInput / 1MOutput / 1MBest for
qwen-3-coder$0.68$2.16Code-specialized output
deepseek-r1$0.68$2.90Deep reasoning
kimi-k2.6$1.28$5.40Agentic flagship (newest)
kimi-k2.5$0.68$3.78Agentic flagship (previous-gen)
qwen-3.6-plus$0.68$4.05Best default overall
deepseek-v4-pro2.352.35 · 0.235 cached$4.70Top reasoning, complex coding (1M context)
claude-opus-4-76.756.75 · 0.675 cached$33.75Anthropic flagship — top engineering quality
claude-sonnet-4-64.054.05 · 0.405 cached$20.25Anthropic balanced — long-running agentic
claude-haiku-4-51.351.35 · 0.135 cached$6.75Anthropic fast tier — 200K context
glm-5.1$1.89$5.94Long-running coding agents

Long-context & multimodal

Use these when context length or image input is the bottleneck.
ModelInput / 1MOutput / 1MBest for
gemini-2.5-flash$0.41$3.381M context
gemini-3-flash$0.68$4.05Newer long-context preview
gemma-4-31b$0.19$0.54Vision + text workflows
Perplexity’s Sonar models run a live web search on every request and return cited answers. They bill two components on the same request: normal token pricing plus a flat web-search fee for the live search.
ModelInput / 1MOutput / 1MWeb-search feeBest for
sonar$1.35$1.35$0.00675 / requestQuick cited answers, current events
sonar-pro$4.05$20.25$0.00675 / requestDeep research, longer cited reports
The web-search fee is flat per request — it does not scale with tokens — so a short sonar question is dominated by the search fee, not the tokens. Every response’s usage.cost already includes it. (The GET /v1/pricing catalog lists the per-token prices above; the flat search fee is applied per request at billing time, not as a catalog field.) Standard models have no per-request fee; use Sonar only when you actually need fresh web data.

Image generation

Image models bill per image, not per token. Cost shown is what you pay; the underlying provider price is marked up 1.35×. See the Image Generation guide and the API reference.
ModelCost / imageBest for
gpt-image-20.014low0.014 low · 0.081 medium · $0.297 highOpenAI flagship — text-in-image, multilingual typography
imagen-4-ultra$0.081Google Imagen 4 premium — print-ready hero, 4K-scale assets
imagen-4$0.054Google Imagen 4 standard — photoreal portraits + scenes
imagen-4-fast$0.027Google Imagen 4 fast tier — drafts, social previews
nano-banana$0.046Google Gemini image-gen — native edit mode (image-in + prompt → image-out)
nano-banana-3-flash$0.046Same as above, Gemini 3.1 preview tier
flux-2-pro0.041(1MP)0.041 (1MP) · 0.101 (4MP)Photoreal, multi-reference blend (up to 10 sources)
recraft-v4-pro$0.3384MP print-ready design
recraft-v4-vector-pro$0.4054MP native SVG, print-ready signage
recraft-v4-vector$0.108Native SVG output — editable paths and layers
ideogram-v3$0.108Typography, logos, packaging
flux-1.1-ultra$0.081Cinematic photo, hero shots, editorial (legacy — prefer flux-2-pro)
recraft-v4$0.054Design-quality default — #1 HF Arena
recraft-v3$0.054Legacy — prefer recraft-v4
flux-kontext-pro$0.054Image edit, inpaint (requires image_url)
minimax-image-01$0.005Sub-cent budget tier, bulk
A request with n: 4 is billed as 4 images. gpt-image-2 requests can opt into quality: low/medium/high per call; the hold books the right tier amount up front so high-quality requests reserve $0.297 (no refund-and-rebill drift). Holds refund in full if generation fails.
Use Which model should I use? to pick by task rather than by price.

What does a typical request cost?

Use caseModelTokens (in+out)Cost
Quick extractionglm-4.7-flash~500~$0.0001
Screenshot understandinggemma-4-31b~1,000~$0.0003
Code reviewqwen-3-32b~3,000~$0.001
Long document summarygemini-2.5-flash~10,000~$0.005
Deep reasoning taskdeepseek-r1~5,000~$0.005
Repo-scale agent stepglm-5.1~8,000~$0.013
With the $0.50 signup credit, you can make hundreds or thousands of requests depending on model choice.

How billing works

  1. Before each request, Kyma holds an estimate from your balance
  2. After the response completes, Kyma calculates the real token cost
  3. If the real cost is lower, the difference is refunded automatically
You pay for actual usage, not the initial estimate.

Credits

ActionAmount
Signup bonus+$0.50
Referral reward+$0.50
Purchase packages5/5 / 20 / 100/100 / 500
Credits never expire. Auto top-up is supported in the dashboard.
# Check your balance
curl https://kymaapi.com/v1/credits/balance \
  -H "Authorization: Bearer ky-your-api-key"

# Full pricing catalog (text + image + video + audio)
curl https://kymaapi.com/v1/pricing

Video generation

Per-second models scale with the duration parameter (default 5s, max 10–15s by model). Hailuo bills flat per call.
ModelCostAudioNotes
kling-2.5-pro$0.0945/sBudget cinematic
veo-3-fast$0.135/sGoogle Veo budget (720p)
hailuo-02-512p$0.140 flatCheapest video (I2V only)
kling-3-pro$0.1512/sPremium cinematic
kling-3-pro-audio$0.2268/snativeCinematic + audio
seedance-2-fast$0.326565/sbundledSocial shorts
seedance-2-pro$0.40959/sbundledMulti-shot action
hailuo-02-768p$0.420 flatMid-tier Hailuo
veo-3$0.540/snativeVeo flagship (1080p + audio)
hailuo-02-1080p$0.780 flatHailuo top tier
See Video Generation. Live source: GET /v1/pricing (video rows).

Audio

Audio splits by capability — transcription, understanding, realtime, TTS, music, voice, SFX. Speech-to-text (per minute, 1-min minimum billable)
ModelCostNotes
whisper-v3-turbo$0.0009/minGroq Whisper — 228× realtime. Alias transcribe
gpt-4o-mini-transcribe-2025-12-15$0.00405/minOpenAI premium STT — code-switching, conversational. Alias transcribe-quality
Audio understanding (per minute)
ModelCostNotes
gemini-3-flash-audio$0.0006/minTone, music, scene understanding
Realtime translation (per minute)
ModelCostNotes
gpt-realtime-translate$0.034/minOpenAI
gemini-2.5-flash-native-audio-preview-12-2025variesLive API session
TTS (per 1k characters)
ModelCostNotes
minimax-speech-turbo$0.090Lowest-latency voice
minimax-speech-hd$0.140Production multilingual
eleven-flash-v2-5~$0.0945ElevenLabs fast
eleven-turbo-v2-5~$0.0945ElevenLabs turbo
eleven-multilingual-v2~$0.189ElevenLabs quality
Music
ModelCostNotes
minimax-music$0.045/songMiniMax flat per song
minimax-music-pro$0.210/songMusic-2.6 family
elevenlabs-music$0.135/sPay per second
Voice services & SFX (flat per call)
ModelCostNotes
minimax-voice-clone$2.10/voiceClone from 10s–5min reference
minimax-voice-design$4.20/voiceGenerate voice from text description
elevenlabs-sfx~$0.027/genSound effect (auto duration)
Live source: GET /v1/pricing (audio rows).

Prompt caching

Cached input tokens are charged at 10% of the normal input price. That means:
  • repeated system prompts get much cheaper
  • tool definitions become cheaper over repeated runs
  • long agent sessions benefit the most
Learn more about prompt caching →

Rate limits

Kyma uses a tier-based system similar to xAI and Anthropic-style paid access: your tier increases based on total credits purchased, not lifetime usage.
TierMin PurchaseRPMPer-model RPMTPM
0 (Free)$03025200K
1 (Starter)$106040500K
2 (Builder)$50120802M
3 (Pro)$2502001505M
4 (Enterprise)$1,00030020010M
RPM = requests per minute Per-model RPM = how many requests per minute a single model can take TPM = tokens per minute across your account There are no daily or monthly caps. Your balance and tier are the real limits.
# Check your current limits
curl https://kymaapi.com/v1/auth/limits \
  -H "Authorization: Bearer ky-your-api-key"

What happens when you hit a limit?

Out of credits
{
  "error": {
    "message": "Insufficient credits. Add credits at https://kymaapi.com/dashboard.",
    "type": "insufficient_credits"
  }
}
Too many requests
{
  "error": {
    "message": "Rate limit exceeded (30 RPM for Tier 0). Try again in a few seconds.",
    "type": "rate_limit"
  }
}
The Retry-After header tells you when to retry.

How to spend less

  • Use cheaper models for extraction, routing, and repetitive automation
  • Use qwen-3-32b instead of a flagship if you mainly need fast coding help
  • Use gemini-2.5-flash only when long context is the real need
  • Use deepseek-r1 only when the task truly needs deeper reasoning
  • Purchase credits if you need higher rate limits, not just more balance

Next steps