Changelog - Kyma API

June 19, 2026

2 new flagship models: GLM 5.2 and Kimi K2.7 Code

Two new open-weight coding models, both live-verified across their serving paths before launch:

glm-5.2 — Zhipu’s newest frontier-open flagship. 744B MoE, 1M-token context (up from ~200K on GLM 5.1), #1 open-weight on the Intelligence Index v4.1. $1.89/$ 5.94 per 1M tokens. The glm-flagship alias now resolves here.
kimi-k2.7-code — Moonshot’s coding specialist. +21.8% on Kimi Code Bench v2 vs K2.6 with ~30% fewer reasoning tokens, 262K context, text+image input, always-on thinking. $1.28/$ 5.40 per 1M tokens.

Both keep their predecessors available: glm-5.1 and kimi-k2.6 are still callable by model ID, and agent/best-agent still resolve to kimi-k2.6.

June 10, 2026

4 new models: Qwen 3.7 Plus, MiniMax M3, Nemotron 3 Ultra, Step 3.7 Flash

Four new text models, all with tool calling and live-verified before launch:

qwen-3.7-plus — successor to qwen-3.6-plus. 1M context, adds vision, $0.54/$ 2.16 per 1M tokens
minimax-m3 — agentic coding focus, 1M context, text+image+video input, $0.405/$ 1.62 (same price as M2.5)
nemotron-3-ultra-550b — largest US open-weight model (550B MoE), 1M context, $0.675/$ 3.375
step-3.7-flash — cheap multimodal flash tier, 256K context, tool calling, $0.27/$ 1.553

Also in this release:

Reliability: per-attempt upstream timeout — a hung provider now fails over instead of hanging your request
Billing hardening: settle idempotency, usage estimation fallback, and time-to-first-byte now measured on every request
deepseek-r1 input price adjusted to $0.74/1M (output unchanged)

June 4, 2026

ElevenLabs v3 — most expressive TTS + low-latency streaming

Added

Model eleven-v3 — the most expressive text-to-speech model on Kyma. Audio tags and emotional range for character voices and dialogue, lifelike delivery across 70+ languages. Same $0.405/1K char as Multilingual v2. Call it on POST /v1/audio/speech with any ElevenLabs voice.

Changed

Feature Opt-in low-latency TTS streaming — pass "stream": true on POST /v1/audio/speech to receive audio progressively as it’s synthesized. Time-to-first-audio drops to ~0.4s (from ~1.8s). The response is still a single complete audio/mpeg stream, so existing clients keep working unchanged. Currently applies to the MiniMax speech models.

May 17, 2026 (later)

Google media models — 7 new SKUs + public pricing catalog

Google’s media models (Imagen 4, Veo 3, Nano Banana) are now available on Kyma. All seven SKUs ship behind the existing /v1/images/generations and /v1/videos/generations endpoints — no client change needed beyond picking the new model value. Added — image models

Model imagen-4-fast — Google Imagen 4 fast tier, $0.027/image
Model imagen-4 — Google Imagen 4 default, $0.054/image (recommended)
Model imagen-4-ultra — Google Imagen 4 print-ready, $0.081/image
Model nano-banana — Gemini 2.5 Flash Image with native edit-mode (image-in + prompt → image-out), $0.046/image
Model nano-banana-3-flash — Gemini 3.1 image preview, $0.046/image

Added — video models (async LRO, ~30–180s)

Model veo-3-fast — Google Veo 3 fast tier, 720p no-audio, $0.135/sec (recommended budget)
Model veo-3 — Google Veo 3 flagship, 1080p with native audio (dialogue + ambient + lip-sync), $0.540/sec

Added — public pricing & limits endpoints

API GET /v1/pricing — full catalog: text + image + video + audio in one round-trip. Replaces the partial /v1/credits/pricing (kept for backward compat). Cache-Control 60s.
API GET /v1/limits/tiers — full 5-tier matrix, public, cache-Control 300s. Drives the Rate Limits guide and dashboard signup flow.

May 17, 2026

Audio infrastructure refresh

Eight changes shipped over two days. Headline: realtime audio now serves up to 5000 concurrent sessions per project (a 100× lift), and STT gains a URL-fetch mode + automatic fallback. Added

Feature Realtime audio WebSocket proxy — scales to 5000 concurrent sessions (was 50). See Realtime Audio.
Feature STT URL mode — POST /v1/audio/transcriptions accepts JSON {"audio_url": "https://..."} up to 100 MB. Multipart upload remains capped at 25 MB. See Audio Transcriptions.
Feature STT never-die failover — when the primary whisper-v3-turbo hits a transient hiccup, Kyma transparently retries, then routes to a timestamp-preserving secondary (whisper-1), and for plain-text transcripts a final tertiary (gemini-3-flash-audio) — same request, same price. Response carries X-Kyma-Fallback (serving model) and X-Kyma-Fallback-Layer when it fires. See Audio Transcriptions.
Feature Tier override for heavy users — partners and enterprises can request Tier 4 limits without the $1000 lifetime deposit. See Rate Limits — Need higher limits?.

Changed

Update Audio rate limits split into per-capability sub-pools (transcription, understanding, speech). Saturating one no longer blocks the others. See Rate Limits — Audio limits.
Update Audio concurrency caps raised across all tiers — Tier 4 now 100 total audio slots (was lower).
Update minimax-music-pro backed by music-2.6 (was music-2.5). API contract and pricing unchanged.

Internal (not user-visible — for the curious)

Ops In-process audio load_factor monitor with Telegram alerts when sustained utilization exceeds 0.70 / 0.85 thresholds.

May 1, 2026

[Model] gpt-image-2 — OpenAI’s flagship image model live on Kyma. Near-perfect text-in-image (multilingual: Japanese, Korean, Hindi, Bengali), reasoning-augmented composition, photoreal output. Quality dropdown low | medium | high: $0.014 / $0.081 / $0.297 per 1024² image. 4 sizes supported (1024², 1024×1536, 1536×1024, 2048²). Single SKU, no -pro derivative — keeps OpenAI’s exact model ID.
[API] Unified picker taxonomy across image, video, and audio composers. Three top-level tiers (Quality / Fast / Cheap) replace the prior ad-hoc grouping; capability sub-axis surfaces SVG (image) and Speech / Music / SFX / STT / Audio-understand (audio). Hailuo 02 1080p moved Quality → Fast; Quality is now reserved for SOTA-class output (Kling 3 Pro, Seedance 2 Pro, gpt-image-2, flux-2-pro, ideogram-v3, recraft-v4-pro). /v1/models exposes new tier and capability fields with ?tier=quality and ?capability=vector filter params. Backward-compat: cost_tier, quality_tier, latency_tier still returned.
[Pricing] New per-quality pricing mode in IMAGE_COSTS. Hold and settle thread the quality param end-to-end so a quality=high request books the right amount on the hold (no refund-and-rebill drift on finalize).
[Reliability] Recovery for multimodal jobs after worker death. Hailuo / Image-01 / gpt-image-2 jobs that exceeded their poll budget after the worker died were leaking forever in processing and holding credits hostage; they now refund cleanly within ~12 minutes for video and ~7 minutes for image. Heartbeat ticker (~1.5 s) keeps in-flight OpenAI calls visible to the sweep so multi-minute high-quality requests don’t get prematurely failed.
[Fix] not_multimodal validator allowlist now reads from a single MULTIMODAL_PROVIDERS set (was a hardcoded chain). Adding a new model source is one line — same regression that bit MiniMax onboarding before.
[Fix] Audio composer prompt counter NFC-normalizes Vietnamese (and other stacked-diacritic) text before measuring length. Telex-encoded Vietnamese was inflating UTF-16 code units 3× and tripping false prompt_too_long rejections at ~660 visible characters on 2000-char-cap models.
[Fix] Per-model audio character limits matching upstream provider caps (MiniMax music: prompt ≤ 200 / lyrics ≤ 600; ElevenLabs music prompt ≤ 2000; SFX prompt ≤ 500). Errors now name the SKU and which field overflowed.

April 30, 2026

MiniMax bundle — 9 new SKUs across audio, image, video

Audio, image, and video coverage all expanded under MiniMax’s PAYG pricing — typically 2× to 90× cheaper than comparable hosted equivalents at matching quality tiers. Audio (TTS + music + voice services):

New model minimax-speech-hd — $0.140/1K char — production multilingual voice, ~2.9× cheaper than eleven-multilingual-v2.
New model minimax-speech-turbo — $0.090/1K char — lowest-latency voice on Kyma, ~2.2× cheaper than eleven-flash-v2-5.
New model minimax-music — $0.045/song flat — Music-2.0 family, ~90× cheaper than elevenlabs-music for non-hero tracks.
New model minimax-music-pro — $0.210/song flat — Music-2.5+ richer arrangements at production fidelity.
New endpoint POST /v1/audio/voice-clone + model — $2.10/voice flat — clone from 10s-5min reference audio (multipart).
New endpoint POST /v1/audio/voice-design + model — $4.20/voice flat — generate a voice from a text description, no reference needed.

Image (sub-cent tier):

New model minimax-image-01 — $0.005/image flat — cheapest image SKU on Kyma, ~11× cheaper than recraft-v4.

Video (Hailuo 02 family, 6s or 10s clips):

New model hailuo-02-512p — $0.140/clip — cheapest video tier on Kyma, ~4× cheaper than Kling 2.5 Pro at 6s.
New model hailuo-02-768p — $0.420/clip — mid tier, balanced quality vs cost.
New model hailuo-02-1080p — $0.780/clip — full HD hero output, less than half the cost of Kling 3 audio at 10s.

New pricing modes: per-song (music), per-call (voice services), per-video (Hailuo) — all flat per request, no duration metering. Image flat-mode SKUs gained an optional listPrice override for safety-buffer rounding. Voice ID ownership: Cloned and designed voice IDs are gated per Kyma user (migration 064-minimax-voice-clones.sql). Sharing a voice_id with another account returns 403 voice_not_owned from /v1/audio/speech.

Image catalog refresh — 5 new SKUs

The image lineup grew from 4 → 9 active SKUs. Better defaults, cheaper hero shots, native SVG output.

New model recraft-v4 — $0.054 — replaces recraft-v3 as the daily default. #1 on the HuggingFace Text-to-Image Arena, beats Midjourney V8 / DALL-E 3 / FLUX in human preference. Same price as V3.
New model recraft-v4-pro — $0.338 — V4 quality at 4MP for print-ready / large-scale assets.
New model recraft-v4-vector — $0.108 — native SVG output with editable paths and layers. The only generation models on the market shipping true vector files.
New model recraft-v4-vector-pro — $0.405 — V4 vector at 4MP for print-ready logos and large-scale signage.
New model flux-2-pro — $0.041–$ 0.101 — BFL’s 32B flagship (3× larger than Flux 1.1). Photoreal, ~60% accurate text-in-image, unified gen+edit. Cheaper than flux-1.1-ultra at 1MP.
New API param image_urls: string[] — multi-reference blending for FLUX.2 Pro, up to 10 source images merged into a single output.
New per-megapixel pricing mode — FLUX.2 Pro bills $0.03 base +$ 0.015 per extra MP, rounded to nearest whole MP, then × 1.35 markup. Hold uses the requested size; finalize uses the actual output dimensions.
recraft-v3 and flux-1.1-ultra are now marked legacy. Existing API contracts continue to work; new projects should use recraft-v4 and flux-2-pro.

April 29, 2026

Audio - 2 new endpoints + 2 SKUs

Kyma now hears. Two synchronous audio endpoints behind the same single-key gate as text, image, and video.

New endpoint POST /v1/audio/transcriptions - speech-to-text, multipart upload, OpenAI Whisper API compatible
New endpoint POST /v1/audio/understand - audio scene Q&A (tone, music, SFX, language, emotion), custom Kyma endpoint
New models: whisper-v3-turbo at $0.0009/min, [`gemini-3-flash-audio`](/models/gemini-3-flash-audio) at$ 0.000648/min
Per-minute pricing - both endpoints bill in 1-minute increments, rounded up. 1-hour file: $0.054 transcribe +$ 0.039 understand = $0.093 total
New aliases - model: "transcribe" and model: "audio-understand" ride forward when underlying SKUs change
Audio rows now flow through the same V2 ledger as text - visible on /logs, /rankings, and admin scorecards
Companion CLI watch-cli - open-source orchestrator that gives any agent eyes and ears for any social video URL (~50x cheaper than full multimodal LLM analysis)

April 26, 2026

Video Generation - 5 new models

Five video models now live behind a single async endpoint.

New endpoint POST /v1/videos/generations - async, returns 202 with a job_id; poll GET /v1/jobs/{id} for the result
New models: kling-2.5-pro, kling-3-pro, kling-3-pro-audio, seedance-2-pro, seedance-2-fast
Per-second pricing - $0.0945 to$ 0.410 per second of video, billed against actual clamped duration
Per-SKU duration caps - Kling 3 family and Seedance support up to 15s; Kling 2.5 stays at 10s
Hold-and-finalize billing - failures refund in full; idempotency keys supported end-to-end
T2V or I2V from a single endpoint - pass image_url to switch any video model into image-to-video mode

April 25, 2026

DeepSeek V4 — Pro and Flash

DeepSeek’s V4 lineup now live on Kyma. Both variants are MIT-licensed, MoE, with 1M context and native reasoning.

deepseek-v4-pro — 1.6T (49B active) flagship for top reasoning and complex coding. $2.35 /$ 4.70 per 1M.
deepseek-v4-flash — 284B (13B active) value tier. Same family behavior at the lowest V4 price. $0.19 /$ 0.38 per 1M.
1M context window, 65K max output, tool calling and structured outputs supported on both.
deepseek-v3 stays available as the previous-gen stable baseline; older workloads do not need to migrate.

Image Generation — Week 1

Four image-generation models now live behind a single async endpoint.

New endpoint POST /v1/images/generations — async, returns 202 with a job_id; poll GET /v1/jobs/{id} for the result
New models: flux-1.1-ultra, flux-kontext-pro (image-edit), ideogram-v3, recraft-v3
Pay-per-image billing with hold-and-finalize semantics — failures are refunded in full
Idempotency key supported end-to-end; safe to retry POSTs without double-charging

April 23, 2026

API Reliability and Platform Changes

This release bundles billing and rate-limit changes, model/provider updates, agent/runtime improvements, dashboard and product polish, API behavior updates with a focus on user-visible improvements.

Per-model soft weight + latency tracking (#8)
Ship Kimi K2.6 + fix kyma-agent auto-update EACCES (#9)
/models nav link + response sanitization fix (#14)
Routing — broadened fallback chain for deepseek-v3 (#7)
Sync Kimi K2.6 across docs, landing pages, integrations (#11)

April 21, 2026

Product and Dashboard Updates

This release bundles billing and rate-limit changes, model/provider updates, agent/runtime improvements, dashboard and product polish, API behavior updates with a focus on user-visible improvements.

AI discoverability + models SEO + billing fixes (#3)
Public model detail — tabbed code blocks like dashboard
Tuned deepseek-v3 fallback chain for better latency
AI agent discoverability + public /models SEO pages (#2)
Admin_top_users RPC — power user analytics without row limit

April 19, 2026

Product and Dashboard Updates

This release bundles billing and rate-limit changes, model/provider updates, agent/runtime improvements, dashboard and product polish, API behavior updates with a focus on user-visible improvements.

Chat playground — searchable model picker, model ID copy, cost per message
Billing page redesign — Anthropic-style single input, price breakdown, auto top-up inputs
Billing page redesign — Anthropic-style layout
Billing page — card editing, invoice links, tier display, unified layout
Billing — inline card modal, invoice creation, estimated taxes, setup checkout

April 17, 2026

Agent, Install, and Runtime Improvements

This release bundles billing and rate-limit changes, model/provider updates, agent/runtime improvements, dashboard and product polish, API behavior updates with a focus on user-visible improvements.

Switch tier system from spend-based to deposit-based
Added a new fallback layer for qwen-3.6-plus
Correct pricing link to docs.kymaapi.com/pricing
Bump kyma-agent to 0.1.6 and ter bootstrap to 0.1.1
Qwen-3.6-plus routing tuned to serve the real closed-weight model

April 16, 2026

Kyma Agent v0.1.12 — KYMA.md context + MCP servers

KYMA.md body injection — project rules now prepend to every turn’s system prompt, not just the initial one. Precedence: ~/.kyma/agent/KYMA.md → ./KYMA.md → ./KYMA.local.md, with CLAUDE.md and AGENTS.md as project fallbacks. Edits apply on the next turn — no restart needed.
/init — scaffold KYMA.md in one step. Detects JS/TS, Python, Rust, Go, Ruby, PHP and their frameworks, then proposes a starter file with frontmatter (model, thinking) + Stack / Conventions / Key files / Agent behavior sections. Preview before write.
/mcp — Model Context Protocol servers — bring any MCP-compatible tool server into Kyma. Configure in ~/.kyma/agent/mcp.json (user) or ./.kyma/mcp.json (project). Subcommands: /mcp to list, /mcp enable <name>, /mcp disable <name>, /mcp test <name>. Tools register automatically at session start as mcp__<server>__<tool>.
/status consolidation — merged /doctor, /balance, /usage into a single /status. It now shows account, credits, tier limits, lifetime spend, session totals, API latency, local diagnostics, and MCP health in one view.
Update with npm install -g @kyma-api/agent@latest.

GLM family from Z.AI

3 new models added, bringing the active catalog to 16 models:
- glm-5.1 — flagship long-running coding agent, repo-scale engineering, 203K context
- glm-4.5-air — cheap agentic bulk tasks, 131K context
- glm-4.7-flash — cheap long-context throughput, 203K context
Implicit caching — 50% off on cache hits where the underlying infrastructure supports it.
Kyma Agent v0.1.11 — the /models slash command now lists GLM 5.1, GLM 4.5 Air, and GLM 4.7 Flash alongside the existing 9 curated models. Update with npm install -g @kyma-api/agent@latest.
Auto-failover — every GLM model has multi-layer fallbacks so requests keep flowing even if a backend is unavailable.

April 15, 2026

Kyma Agent v0.1.8

Fixed /doctor diagnostics in kyma after the 0.1.7 bridge release. The install flow stays the same: npm install -g @kyma-api/agent.
kyma-ter remains pinned to 0.1.7; this was a package-only hotfix for the bundled ESM diagnostics path.

April 11, 2026

Kyma CLI v0.3

Historical note: this entry describes the April 11 launch state. The current package is @kyma-api/agent, which installs both kyma and kyma-ter.
kyma command — At that time, install was described as npm install -g kyma-api, then just type kyma to start an interactive chat session from your terminal. The current install path is documented in Kyma Agent.
Interactive model picker — Type /model in chat to browse active models with arrow keys. Or /model deepseek-r1 to switch instantly.
Slash commands — /model, /models, /balance, /clear, /help, /exit — manage your session without leaving the chat.
Pipe mode — cat error.log | kyma "fix this" or git diff | kyma "review this". Auto-detects non-TTY and outputs clean text.
Device code login — kyma login opens your browser, auto-fills the code, copies to clipboard. Supports Google OAuth.
JSON mode — kyma models --json for scripts and CI. Auto-quiet in non-TTY environments.

April 10, 2026

Higher Limits, Better Emails

2.5x higher token limits — Free tier now gets 200K tokens/minute (was 80K) and 30 RPM (was 20). Your coding agents can run longer sessions without hitting walls. Tier 1+ also increased proportionally.
Transactional emails — You’ll now receive a welcome email on signup, a receipt after every purchase, and a heads-up when your balance is running low. Auto top-up failures also notify you immediately.
Model expansion — Added MiniMax M2.7, Nemotron 3 Super, Step 3.5 Flash, GLM 4.5 Air, Gemma 4 26B MoE at that time. Model grid reorganized into Recommended / Coding & Agents / Fast & Long Context categories.
Compare page — New /compare page with honest Kyma vs other LLM gateways and direct APIs comparison, including benchmark data and migration snippets.
Prompt caching as USP — 48% cache hit rate on heavy users, 23% average cost savings. Now highlighted on homepage.

April 8, 2026

Models & Pricing

DeepSeek V3 + R1 — added as primary models with multi-layer fallbacks. DeepSeek V3 is GPT-5 class quality. DeepSeek R1 is a reasoning model 96% cheaper than o1.
Expanded backend redundancy — additional infrastructure for DeepSeek, Llama, and Qwen models. Adds redundancy and lower latency.
4 models disabled — removed a low-yield Qwen 3 235B variant (high failure rate), gpt-oss-20b, llama-3.1-8b, gemma-3-27b (superseded by better models).
Pricing audit — corrected 8 model prices. New principle: price at MAX(all infrastructure costs including fallbacks) x 1.35 markup. If a fallback is too expensive, we remove it rather than raise your price.
19 active models at that time with multi-layer infrastructure redundancy.

April 7, 2026

⚡ Reliability & Performance

Auto-failover — if a model’s primary infrastructure is down, requests automatically retry on backup providers. Most failures are invisible to you.
Faster responses — reduced internal overhead by ~250ms per request through smarter caching.
Better rate limiting — rate limits are now shared across our infrastructure (no more inconsistent counts).
Fallback headers — responses include X-Kyma-Fallback: true when a backup was used, plus X-Kyma-Fallback-Layer: 1|2|3 indicating how deep the fallback went.
More models available — expanded backup infrastructure means models stay available even during provider outages.

April 4, 2026

🚀 Launch

Launch model set — Llama 3.3 70B, Llama 4 Scout, Qwen 3 32B/235B, Gemma 4, Kimi K2, GPT-OSS, Gemini
Dashboard — API keys, usage stats, playground, model browser
Google Sign-In — one-click login
OpenAI compatible — drop-in replacement for any OpenAI SDK
Tier-based rate limits: Tier 0 (free) = 30 RPM, up to Tier 4 = 300 RPM
Streaming support
Supabase PostgreSQL backend

​June 19, 2026

​2 new flagship models: GLM 5.2 and Kimi K2.7 Code

​June 10, 2026

​4 new models: Qwen 3.7 Plus, MiniMax M3, Nemotron 3 Ultra, Step 3.7 Flash

​June 4, 2026

​ElevenLabs v3 — most expressive TTS + low-latency streaming

​May 17, 2026 (later)

​Google media models — 7 new SKUs + public pricing catalog

​May 17, 2026

​Audio infrastructure refresh

​May 1, 2026

​April 30, 2026

​MiniMax bundle — 9 new SKUs across audio, image, video

​Image catalog refresh — 5 new SKUs

​April 29, 2026

​Audio - 2 new endpoints + 2 SKUs

​April 26, 2026

​Video Generation - 5 new models

​April 25, 2026

​DeepSeek V4 — Pro and Flash

​Image Generation — Week 1

​April 23, 2026

​API Reliability and Platform Changes

​April 21, 2026

​Product and Dashboard Updates

​April 19, 2026

​Product and Dashboard Updates

​April 17, 2026

​Agent, Install, and Runtime Improvements

​April 16, 2026

​Kyma Agent v0.1.12 — KYMA.md context + MCP servers

​GLM family from Z.AI

​April 15, 2026

​Kyma Agent v0.1.8

​April 11, 2026

​Kyma CLI v0.3

​April 10, 2026

​Higher Limits, Better Emails

​April 8, 2026

​Models & Pricing

​April 7, 2026

​⚡ Reliability & Performance

​April 4, 2026

​🚀 Launch

June 19, 2026

2 new flagship models: GLM 5.2 and Kimi K2.7 Code

June 10, 2026

4 new models: Qwen 3.7 Plus, MiniMax M3, Nemotron 3 Ultra, Step 3.7 Flash

June 4, 2026

ElevenLabs v3 — most expressive TTS + low-latency streaming

May 17, 2026 (later)

Google media models — 7 new SKUs + public pricing catalog

May 17, 2026

Audio infrastructure refresh

May 1, 2026

April 30, 2026

MiniMax bundle — 9 new SKUs across audio, image, video

Image catalog refresh — 5 new SKUs

April 29, 2026

Audio - 2 new endpoints + 2 SKUs

April 26, 2026

Video Generation - 5 new models

April 25, 2026

DeepSeek V4 — Pro and Flash

Image Generation — Week 1

April 23, 2026

API Reliability and Platform Changes

April 21, 2026

Product and Dashboard Updates

April 19, 2026

Product and Dashboard Updates

April 17, 2026

Agent, Install, and Runtime Improvements

April 16, 2026

Kyma Agent v0.1.12 — KYMA.md context + MCP servers

GLM family from Z.AI

April 15, 2026

Kyma Agent v0.1.8

April 11, 2026

Kyma CLI v0.3

April 10, 2026

Higher Limits, Better Emails

April 8, 2026

Models & Pricing

April 7, 2026

⚡ Reliability & Performance

April 4, 2026

🚀 Launch