Documentation Index
Fetch the complete documentation index at: https://docs.kymaapi.com/llms.txt
Use this file to discover all available pages before exploring further.
May 17, 2026 (later)
Vertex AI media — 7 new SKUs + public pricing catalog
Google Vertex AI infrastructure rolled out earlier this week now hosts a media bundle on Kyma. All seven SKUs ship behind the existing/v1/images/generations and /v1/videos/generations endpoints — no client change needed beyond picking the new model value.
Added — image models
Modelimagen-4-fast— Google Imagen 4 fast tier, $0.027/imageModelimagen-4— Google Imagen 4 default, $0.054/image (recommended)Modelimagen-4-ultra— Google Imagen 4 print-ready, $0.081/imageModelnano-banana— Gemini 2.5 Flash Image with native edit-mode (image-in + prompt → image-out), $0.046/imageModelnano-banana-3-flash— Gemini 3.1 image preview, $0.046/image
Modelveo-3-fast— Google Veo 3 fast tier, 720p no-audio, $0.135/sec (recommended budget)Modelveo-3— Google Veo 3 flagship, 1080p with native audio (dialogue + ambient + lip-sync), $0.540/sec
APIGET /v1/pricing— full catalog: text + image + video + audio in one round-trip. Replaces the partial/v1/credits/pricing(kept for backward compat). Cache-Control 60s.APIGET /v1/limits/tiers— full 5-tier matrix, public, cache-Control 300s. Drives the Rate Limits guide and dashboard signup flow.
May 17, 2026
Audio infrastructure refresh
Eight changes shipped over two days. Headline: realtime audio now serves up to 5000 concurrent sessions per project (a 100× lift), and STT gains a URL-fetch mode + automatic fallback. AddedFeatureVertex Live API WebSocket proxy — realtime audio scales to 5000 concurrent sessions (was 50). See Realtime Audio.FeatureSTT URL mode —POST /v1/audio/transcriptionsaccepts JSON{"audio_url": "https://..."}up to 100 MB. Multipart upload remains capped at 25 MB. See Audio Transcriptions.FeatureSTT automatic fallback — Groq Whisper 5xx / network failures transparently fall back to Vertex Gemini transcription fortextandjsonresponse formats. Response carriesX-Kyma-Fallback: vertex-geminiwhen it fires.FeatureTier override for heavy users — partners and enterprises can request Tier 4 limits without the $1000 lifetime deposit. See Rate Limits — Need higher limits?.
UpdateAudio rate limits split into 4 per-provider sub-pools (groq, vertex, elevenlabs, minimax). Saturating one no longer blocks the others. See Rate Limits — Audio limits.UpdateAudio concurrency caps raised across all tiers — Tier 4 now 100 total audio slots (was lower).Updateminimax-music-probacked bymusic-2.6(wasmusic-2.5). API contract and pricing unchanged.
OpsIn-process audio load_factor monitor with Telegram alerts when sustained utilization exceeds 0.70 / 0.85 thresholds.
May 1, 2026
- [Model]
gpt-image-2— OpenAI’s flagship image model live on Kyma. Near-perfect text-in-image (multilingual: Japanese, Korean, Hindi, Bengali), reasoning-augmented composition, photoreal output. Quality dropdownlow | medium | high: $0.014 / $0.081 / $0.297 per 1024² image. 4 sizes supported (1024², 1024×1536, 1536×1024, 2048²). Single SKU, no-proderivative — keeps OpenAI’s exact model ID. - [API] Unified picker taxonomy across image, video, and audio composers. Three top-level tiers (
Quality / Fast / Cheap) replace the prior ad-hoc grouping; capability sub-axis surfaces SVG (image) and Speech / Music / SFX / STT / Audio-understand (audio). Hailuo 02 1080p moved Quality → Fast; Quality is now reserved for SOTA-class output (Kling 3 Pro, Seedance 2 Pro, gpt-image-2, flux-2-pro, ideogram-v3, recraft-v4-pro)./v1/modelsexposes newtierandcapabilityfields with?tier=qualityand?capability=vectorfilter params. Backward-compat:cost_tier,quality_tier,latency_tierstill returned. - [Pricing] New
per-qualitypricing mode inIMAGE_COSTS. Hold and settle thread thequalityparam end-to-end so aquality=highrequest books the right amount on the hold (no refund-and-rebill drift on finalize). - [Reliability] Recovery for non-fal multimodal jobs after worker death. Hailuo / Image-01 / gpt-image-2 jobs that exceeded their poll budget after the worker died were leaking forever in
processingand holding credits hostage; they now refund cleanly within ~12 minutes for video and ~7 minutes for image. Heartbeat ticker (~1.5 s) keeps in-flight OpenAI calls visible to the sweep so multi-minute high-quality requests don’t get prematurely failed. - [Fix]
not_multimodalvalidator allowlist now reads from a singleMULTIMODAL_PROVIDERSset (was a hardcodedfal | minimaxchain). Adding a new provider is one line — same regression that bit MiniMax onboarding before. - [Fix] Audio composer prompt counter NFC-normalizes Vietnamese (and other stacked-diacritic) text before measuring length. Telex-encoded Vietnamese was inflating UTF-16 code units 3× and tripping false
prompt_too_longrejections at ~660 visible characters on 2000-char-cap models. - [Fix] Per-model audio character limits matching upstream provider caps (MiniMax music: prompt ≤ 200 / lyrics ≤ 600; ElevenLabs music prompt ≤ 2000; SFX prompt ≤ 500). Errors now name the SKU and which field overflowed.
April 30, 2026
MiniMax bundle — 9 new SKUs across audio, image, video
Audio, image, and video coverage all expanded under MiniMax’s PAYG pricing — typically 2× to 90× cheaper than ElevenLabs / fal-hosted equivalents at matching quality tiers. Audio (TTS + music + voice services):- New model
minimax-speech-hd— $0.140/1K char — production multilingual voice, ~2.9× cheaper thaneleven-multilingual-v2. - New model
minimax-speech-turbo— $0.090/1K char — lowest-latency voice on Kyma, ~2.2× cheaper thaneleven-flash-v2-5. - New model
minimax-music— $0.045/song flat — Music-2.0 family, ~90× cheaper thanelevenlabs-musicfor non-hero tracks. - New model
minimax-music-pro— $0.210/song flat — Music-2.5+ richer arrangements at production fidelity. - New endpoint
POST /v1/audio/voice-clone+ model — $2.10/voice flat — clone from 10s-5min reference audio (multipart). - New endpoint
POST /v1/audio/voice-design+ model — $4.20/voice flat — generate a voice from a text description, no reference needed.
- New model
minimax-image-01— $0.005/image flat — cheapest image SKU on Kyma, ~11× cheaper thanrecraft-v4.
- New model
hailuo-02-512p— $0.140/clip — cheapest video tier on Kyma, ~4× cheaper than Kling 2.5 Pro at 6s. - New model
hailuo-02-768p— $0.420/clip — mid tier, balanced quality vs cost. - New model
hailuo-02-1080p— $0.780/clip — full HD hero output, less than half the cost of Kling 3 audio at 10s.
per-song (music), per-call (voice services), per-video (Hailuo) — all flat per request, no duration metering. Image flat-mode SKUs gained an optional listPrice override for safety-buffer rounding.
Voice ID ownership: Cloned and designed voice IDs are gated per Kyma user (migration 064-minimax-voice-clones.sql). Sharing a voice_id with another account returns 403 voice_not_owned from /v1/audio/speech.
Image catalog refresh — 5 new SKUs
The image lineup grew from 4 → 9 active SKUs. Better defaults, cheaper hero shots, native SVG output.- New model
recraft-v4— $0.054 — replacesrecraft-v3as the daily default. #1 on the HuggingFace Text-to-Image Arena, beats Midjourney V8 / DALL-E 3 / FLUX in human preference. Same price as V3. - New model
recraft-v4-pro— $0.338 — V4 quality at 4MP for print-ready / large-scale assets. - New model
recraft-v4-vector— $0.108 — native SVG output with editable paths and layers. The only generation models on the market shipping true vector files. - New model
recraft-v4-vector-pro— $0.405 — V4 vector at 4MP for print-ready logos and large-scale signage. - New model
flux-2-pro— 0.101 — BFL’s 32B flagship (3× larger than Flux 1.1). Photoreal, ~60% accurate text-in-image, unified gen+edit. Cheaper thanflux-1.1-ultraat 1MP. - New API param
image_urls: string[]— multi-reference blending for FLUX.2 Pro, up to 10 source images merged into a single output. - New per-megapixel pricing mode — FLUX.2 Pro bills 0.015 per extra MP, rounded to nearest whole MP, then × 1.35 markup. Hold uses the requested size; finalize uses the actual output dimensions.
recraft-v3andflux-1.1-ultraare now marked legacy. Existing API contracts continue to work; new projects should userecraft-v4andflux-2-pro.
April 29, 2026
Audio - 2 new endpoints + 2 SKUs
Kyma now hears. Two synchronous audio endpoints behind the same single-key gate as text, image, and video.- New endpoint
POST /v1/audio/transcriptions- speech-to-text, multipart upload, OpenAI Whisper API compatible - New endpoint
POST /v1/audio/understand- audio scene Q&A (tone, music, SFX, language, emotion), custom Kyma endpoint - New models:
whisper-v3-turboat 0.000648/min - Per-minute pricing - both endpoints bill in 1-minute increments, rounded up. 1-hour file: 0.039 understand = $0.093 total
- New aliases -
model: "transcribe"andmodel: "audio-understand"ride forward when underlying SKUs change - Audio rows now flow through the same V2 ledger as text - visible on
/logs,/rankings, and admin scorecards - Companion CLI
watch-cli- open-source orchestrator that gives any agent eyes and ears for any social video URL (~50x cheaper than full multimodal LLM analysis)
April 26, 2026
Video Generation - 5 new models
Five video models now live behind a single async endpoint.- New endpoint
POST /v1/videos/generations- async, returns 202 with ajob_id; pollGET /v1/jobs/{id}for the result - New models:
kling-2.5-pro,kling-3-pro,kling-3-pro-audio,seedance-2-pro,seedance-2-fast - Per-second pricing - 0.410 per second of video, billed against actual clamped duration
- Per-SKU duration caps - Kling 3 family and Seedance support up to 15s; Kling 2.5 stays at 10s
- Hold-and-finalize billing - failures refund in full; idempotency keys supported end-to-end
- T2V or I2V from a single endpoint - pass
image_urlto switch any video model into image-to-video mode
April 25, 2026
DeepSeek V4 — Pro and Flash
DeepSeek’s V4 lineup now live on Kyma. Both variants are MIT-licensed, MoE, with 1M context and native reasoning.deepseek-v4-pro— 1.6T (49B active) flagship for top reasoning and complex coding. 4.70 per 1M.deepseek-v4-flash— 284B (13B active) value tier. Same family behavior at the lowest V4 price. 0.38 per 1M.- 1M context window, 65K max output, tool calling and structured outputs supported on both.
deepseek-v3stays available as the previous-gen stable baseline; older workloads do not need to migrate.
Image Generation — Week 1
Four image-generation models now live behind a single async endpoint.- New endpoint
POST /v1/images/generations— async, returns 202 with ajob_id; pollGET /v1/jobs/{id}for the result - New models:
flux-1.1-ultra,flux-kontext-pro(image-edit),ideogram-v3,recraft-v3 - Pay-per-image billing with hold-and-finalize semantics — failures are refunded in full
- Idempotency key supported end-to-end; safe to retry POSTs without double-charging
April 23, 2026
API Reliability and Platform Changes
This release bundles billing and rate-limit changes, model/provider updates, agent/runtime improvements, dashboard and product polish, API behavior updates with a focus on user-visible improvements.- Per-model soft weight + latency tracking (#8)
- Ship Kimi K2.6 + fix kyma-agent auto-update EACCES (#9)
- /models nav link + response sanitization fix (#14)
- Routing — broadened fallback chain for deepseek-v3 (#7)
- Sync Kimi K2.6 across docs, landing pages, integrations (#11)
April 21, 2026
Product and Dashboard Updates
This release bundles billing and rate-limit changes, model/provider updates, agent/runtime improvements, dashboard and product polish, API behavior updates with a focus on user-visible improvements.- AI discoverability + models SEO + billing fixes (#3)
- Public model detail — tabbed code blocks like dashboard
- Tuned deepseek-v3 fallback chain for better latency
- AI agent discoverability + public /models SEO pages (#2)
- Admin_top_users RPC — power user analytics without row limit
April 19, 2026
Product and Dashboard Updates
This release bundles billing and rate-limit changes, model/provider updates, agent/runtime improvements, dashboard and product polish, API behavior updates with a focus on user-visible improvements.- Chat playground — searchable model picker, model ID copy, cost per message
- Billing page redesign — Anthropic-style single input, price breakdown, auto top-up inputs
- Billing page redesign — Anthropic-style layout
- Billing page — card editing, invoice links, tier display, unified layout
- Billing — inline card modal, invoice creation, estimated taxes, setup checkout
April 17, 2026
Agent, Install, and Runtime Improvements
This release bundles billing and rate-limit changes, model/provider updates, agent/runtime improvements, dashboard and product polish, API behavior updates with a focus on user-visible improvements.- Switch tier system from spend-based to deposit-based
- Added a new fallback layer for qwen-3.6-plus
- Correct pricing link to docs.kymaapi.com/pricing
- Bump kyma-agent to 0.1.6 and ter bootstrap to 0.1.1
- Qwen-3.6-plus routing tuned to serve the real closed-weight model
April 16, 2026
Kyma Agent v0.1.12 — KYMA.md context + MCP servers
- KYMA.md body injection — project rules now prepend to every turn’s system prompt, not just the initial one. Precedence:
~/.kyma/agent/KYMA.md→./KYMA.md→./KYMA.local.md, withCLAUDE.mdandAGENTS.mdas project fallbacks. Edits apply on the next turn — no restart needed. /init— scaffoldKYMA.mdin one step. Detects JS/TS, Python, Rust, Go, Ruby, PHP and their frameworks, then proposes a starter file with frontmatter (model, thinking) + Stack / Conventions / Key files / Agent behavior sections. Preview before write./mcp— Model Context Protocol servers — bring any MCP-compatible tool server into Kyma. Configure in~/.kyma/agent/mcp.json(user) or./.kyma/mcp.json(project). Subcommands:/mcpto list,/mcp enable <name>,/mcp disable <name>,/mcp test <name>. Tools register automatically at session start asmcp__<server>__<tool>./statusconsolidation — merged/doctor,/balance,/usageinto a single/status. It now shows account, credits, tier limits, lifetime spend, session totals, API latency, local diagnostics, and MCP health in one view.- Update with
npm install -g @kyma-api/agent@latest.
GLM family from Z.AI
- 3 new models added, bringing the active catalog to 16 models:
glm-5.1— flagship long-running coding agent, repo-scale engineering, 203K contextglm-4.5-air— cheap agentic bulk tasks, 131K contextglm-4.7-flash— cheap long-context throughput, 203K context
- Implicit caching — 50% off on cache hits where the underlying infrastructure supports it.
- Kyma Agent v0.1.11 — the
/modelsslash command now lists GLM 5.1, GLM 4.5 Air, and GLM 4.7 Flash alongside the existing 9 curated models. Update withnpm install -g @kyma-api/agent@latest. - Auto-failover — every GLM model has multi-layer fallbacks so requests keep flowing even if a backend is unavailable.
April 15, 2026
Kyma Agent v0.1.8
- Fixed
/doctordiagnostics inkymaafter the0.1.7bridge release. The install flow stays the same:npm install -g @kyma-api/agent. kyma-terremains pinned to0.1.7; this was a package-only hotfix for the bundled ESM diagnostics path.
April 11, 2026
Kyma CLI v0.3
- Historical note: this entry describes the April 11 launch state. The current package is
@kyma-api/agent, which installs bothkymaandkyma-ter. kymacommand — At that time, install was described asnpm install -g kyma-api, then just typekymato start an interactive chat session from your terminal. The current install path is documented in Kyma Agent.- Interactive model picker — Type
/modelin chat to browse active models with arrow keys. Or/model deepseek-r1to switch instantly. - Slash commands —
/model,/models,/balance,/clear,/help,/exit— manage your session without leaving the chat. - Pipe mode —
cat error.log | kyma "fix this"orgit diff | kyma "review this". Auto-detects non-TTY and outputs clean text. - Device code login —
kyma loginopens your browser, auto-fills the code, copies to clipboard. Supports Google OAuth. - JSON mode —
kyma models --jsonfor scripts and CI. Auto-quiet in non-TTY environments.
April 10, 2026
Higher Limits, Better Emails
- 2.5x higher token limits — Free tier now gets 200K tokens/minute (was 80K) and 30 RPM (was 20). Your coding agents can run longer sessions without hitting walls. Tier 1+ also increased proportionally.
- Transactional emails — You’ll now receive a welcome email on signup, a receipt after every purchase, and a heads-up when your balance is running low. Auto top-up failures also notify you immediately.
- Model expansion — Added MiniMax M2.7, Nemotron 3 Super, Step 3.5 Flash, GLM 4.5 Air, Gemma 4 26B MoE at that time. Model grid reorganized into Recommended / Coding & Agents / Fast & Long Context categories.
- Compare page — New
/comparepage with honest Kyma vs other LLM gateways and direct APIs comparison, including benchmark data and migration snippets. - Prompt caching as USP — 48% cache hit rate on heavy users, 23% average cost savings. Now highlighted on homepage.
April 8, 2026
Models & Pricing
- DeepSeek V3 + R1 — added as primary models with multi-layer fallbacks. DeepSeek V3 is GPT-5 class quality. DeepSeek R1 is a reasoning model 96% cheaper than o1.
- Expanded backend redundancy — additional infrastructure for DeepSeek, Llama, and Qwen models. Adds redundancy and lower latency.
- 4 models disabled — removed a low-yield Qwen 3 235B variant (high failure rate),
gpt-oss-20b,llama-3.1-8b,gemma-3-27b(superseded by better models). - Pricing audit — corrected 8 model prices. New principle: price at MAX(all infrastructure costs including fallbacks) x 1.35 markup. If a fallback is too expensive, we remove it rather than raise your price.
- 19 active models at that time with multi-layer infrastructure redundancy.
April 7, 2026
⚡ Reliability & Performance
- Auto-failover — if a model’s primary infrastructure is down, requests automatically retry on backup providers. Most failures are invisible to you.
- Faster responses — reduced internal overhead by ~250ms per request through smarter caching.
- Better rate limiting — rate limits are now shared across our infrastructure (no more inconsistent counts).
- Fallback headers — responses include
X-Kyma-Fallback: truewhen a backup was used, plusX-Kyma-Fallback-Layer: 1|2|3indicating how deep the fallback went. - More models available — expanded backup infrastructure means models stay available even during provider outages.
April 4, 2026
🚀 Launch
- Launch model set — Llama 3.3 70B, Llama 4 Scout, Qwen 3 32B/235B, Gemma 4, Kimi K2, GPT-OSS, Gemini
- Dashboard — API keys, usage stats, playground, model browser
- Google Sign-In — one-click login
- OpenAI compatible — drop-in replacement for any OpenAI SDK
- Tier-based rate limits: Tier 0 (free) = 30 RPM, up to Tier 4 = 300 RPM
- Streaming support
- Supabase PostgreSQL backend