Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kymaapi.com/llms.txt

Use this file to discover all available pages before exploring further.

May 17, 2026 (later)

Vertex AI media — 7 new SKUs + public pricing catalog

Google Vertex AI infrastructure rolled out earlier this week now hosts a media bundle on Kyma. All seven SKUs ship behind the existing /v1/images/generations and /v1/videos/generations endpoints — no client change needed beyond picking the new model value. Added — image models
  • Model imagen-4-fast — Google Imagen 4 fast tier, $0.027/image
  • Model imagen-4 — Google Imagen 4 default, $0.054/image (recommended)
  • Model imagen-4-ultra — Google Imagen 4 print-ready, $0.081/image
  • Model nano-banana — Gemini 2.5 Flash Image with native edit-mode (image-in + prompt → image-out), $0.046/image
  • Model nano-banana-3-flash — Gemini 3.1 image preview, $0.046/image
Added — video models (async LRO, ~30–180s)
  • Model veo-3-fast — Google Veo 3 fast tier, 720p no-audio, $0.135/sec (recommended budget)
  • Model veo-3 — Google Veo 3 flagship, 1080p with native audio (dialogue + ambient + lip-sync), $0.540/sec
Added — public pricing & limits endpoints
  • API GET /v1/pricing — full catalog: text + image + video + audio in one round-trip. Replaces the partial /v1/credits/pricing (kept for backward compat). Cache-Control 60s.
  • API GET /v1/limits/tiers — full 5-tier matrix, public, cache-Control 300s. Drives the Rate Limits guide and dashboard signup flow.

May 17, 2026

Audio infrastructure refresh

Eight changes shipped over two days. Headline: realtime audio now serves up to 5000 concurrent sessions per project (a 100× lift), and STT gains a URL-fetch mode + automatic fallback. Added
  • Feature Vertex Live API WebSocket proxy — realtime audio scales to 5000 concurrent sessions (was 50). See Realtime Audio.
  • Feature STT URL mode — POST /v1/audio/transcriptions accepts JSON {"audio_url": "https://..."} up to 100 MB. Multipart upload remains capped at 25 MB. See Audio Transcriptions.
  • Feature STT automatic fallback — Groq Whisper 5xx / network failures transparently fall back to Vertex Gemini transcription for text and json response formats. Response carries X-Kyma-Fallback: vertex-gemini when it fires.
  • Feature Tier override for heavy users — partners and enterprises can request Tier 4 limits without the $1000 lifetime deposit. See Rate Limits — Need higher limits?.
Changed
  • Update Audio rate limits split into 4 per-provider sub-pools (groq, vertex, elevenlabs, minimax). Saturating one no longer blocks the others. See Rate Limits — Audio limits.
  • Update Audio concurrency caps raised across all tiers — Tier 4 now 100 total audio slots (was lower).
  • Update minimax-music-pro backed by music-2.6 (was music-2.5). API contract and pricing unchanged.
Internal (not user-visible — for the curious)
  • Ops In-process audio load_factor monitor with Telegram alerts when sustained utilization exceeds 0.70 / 0.85 thresholds.

May 1, 2026

  • [Model] gpt-image-2 — OpenAI’s flagship image model live on Kyma. Near-perfect text-in-image (multilingual: Japanese, Korean, Hindi, Bengali), reasoning-augmented composition, photoreal output. Quality dropdown low | medium | high: $0.014 / $0.081 / $0.297 per 1024² image. 4 sizes supported (1024², 1024×1536, 1536×1024, 2048²). Single SKU, no -pro derivative — keeps OpenAI’s exact model ID.
  • [API] Unified picker taxonomy across image, video, and audio composers. Three top-level tiers (Quality / Fast / Cheap) replace the prior ad-hoc grouping; capability sub-axis surfaces SVG (image) and Speech / Music / SFX / STT / Audio-understand (audio). Hailuo 02 1080p moved Quality → Fast; Quality is now reserved for SOTA-class output (Kling 3 Pro, Seedance 2 Pro, gpt-image-2, flux-2-pro, ideogram-v3, recraft-v4-pro). /v1/models exposes new tier and capability fields with ?tier=quality and ?capability=vector filter params. Backward-compat: cost_tier, quality_tier, latency_tier still returned.
  • [Pricing] New per-quality pricing mode in IMAGE_COSTS. Hold and settle thread the quality param end-to-end so a quality=high request books the right amount on the hold (no refund-and-rebill drift on finalize).
  • [Reliability] Recovery for non-fal multimodal jobs after worker death. Hailuo / Image-01 / gpt-image-2 jobs that exceeded their poll budget after the worker died were leaking forever in processing and holding credits hostage; they now refund cleanly within ~12 minutes for video and ~7 minutes for image. Heartbeat ticker (~1.5 s) keeps in-flight OpenAI calls visible to the sweep so multi-minute high-quality requests don’t get prematurely failed.
  • [Fix] not_multimodal validator allowlist now reads from a single MULTIMODAL_PROVIDERS set (was a hardcoded fal | minimax chain). Adding a new provider is one line — same regression that bit MiniMax onboarding before.
  • [Fix] Audio composer prompt counter NFC-normalizes Vietnamese (and other stacked-diacritic) text before measuring length. Telex-encoded Vietnamese was inflating UTF-16 code units 3× and tripping false prompt_too_long rejections at ~660 visible characters on 2000-char-cap models.
  • [Fix] Per-model audio character limits matching upstream provider caps (MiniMax music: prompt ≤ 200 / lyrics ≤ 600; ElevenLabs music prompt ≤ 2000; SFX prompt ≤ 500). Errors now name the SKU and which field overflowed.

April 30, 2026

MiniMax bundle — 9 new SKUs across audio, image, video

Audio, image, and video coverage all expanded under MiniMax’s PAYG pricing — typically 2× to 90× cheaper than ElevenLabs / fal-hosted equivalents at matching quality tiers. Audio (TTS + music + voice services):
  • New model minimax-speech-hd — $0.140/1K char — production multilingual voice, ~2.9× cheaper than eleven-multilingual-v2.
  • New model minimax-speech-turbo — $0.090/1K char — lowest-latency voice on Kyma, ~2.2× cheaper than eleven-flash-v2-5.
  • New model minimax-music — $0.045/song flat — Music-2.0 family, ~90× cheaper than elevenlabs-music for non-hero tracks.
  • New model minimax-music-pro — $0.210/song flat — Music-2.5+ richer arrangements at production fidelity.
  • New endpoint POST /v1/audio/voice-clone + model — $2.10/voice flat — clone from 10s-5min reference audio (multipart).
  • New endpoint POST /v1/audio/voice-design + model — $4.20/voice flat — generate a voice from a text description, no reference needed.
Image (sub-cent tier):
  • New model minimax-image-01 — $0.005/image flat — cheapest image SKU on Kyma, ~11× cheaper than recraft-v4.
Video (Hailuo 02 family, 6s or 10s clips):
  • New model hailuo-02-512p — $0.140/clip — cheapest video tier on Kyma, ~4× cheaper than Kling 2.5 Pro at 6s.
  • New model hailuo-02-768p — $0.420/clip — mid tier, balanced quality vs cost.
  • New model hailuo-02-1080p — $0.780/clip — full HD hero output, less than half the cost of Kling 3 audio at 10s.
New pricing modes: per-song (music), per-call (voice services), per-video (Hailuo) — all flat per request, no duration metering. Image flat-mode SKUs gained an optional listPrice override for safety-buffer rounding. Voice ID ownership: Cloned and designed voice IDs are gated per Kyma user (migration 064-minimax-voice-clones.sql). Sharing a voice_id with another account returns 403 voice_not_owned from /v1/audio/speech.

Image catalog refresh — 5 new SKUs

The image lineup grew from 4 → 9 active SKUs. Better defaults, cheaper hero shots, native SVG output.
  • New model recraft-v4 — $0.054 — replaces recraft-v3 as the daily default. #1 on the HuggingFace Text-to-Image Arena, beats Midjourney V8 / DALL-E 3 / FLUX in human preference. Same price as V3.
  • New model recraft-v4-pro — $0.338 — V4 quality at 4MP for print-ready / large-scale assets.
  • New model recraft-v4-vector — $0.108 — native SVG output with editable paths and layers. The only generation models on the market shipping true vector files.
  • New model recraft-v4-vector-pro — $0.405 — V4 vector at 4MP for print-ready logos and large-scale signage.
  • New model flux-2-pro0.0410.041–0.101 — BFL’s 32B flagship (3× larger than Flux 1.1). Photoreal, ~60% accurate text-in-image, unified gen+edit. Cheaper than flux-1.1-ultra at 1MP.
  • New API param image_urls: string[]multi-reference blending for FLUX.2 Pro, up to 10 source images merged into a single output.
  • New per-megapixel pricing mode — FLUX.2 Pro bills 0.03base+0.03 base + 0.015 per extra MP, rounded to nearest whole MP, then × 1.35 markup. Hold uses the requested size; finalize uses the actual output dimensions.
  • recraft-v3 and flux-1.1-ultra are now marked legacy. Existing API contracts continue to work; new projects should use recraft-v4 and flux-2-pro.

April 29, 2026

Audio - 2 new endpoints + 2 SKUs

Kyma now hears. Two synchronous audio endpoints behind the same single-key gate as text, image, and video.
  • New endpoint POST /v1/audio/transcriptions - speech-to-text, multipart upload, OpenAI Whisper API compatible
  • New endpoint POST /v1/audio/understand - audio scene Q&A (tone, music, SFX, language, emotion), custom Kyma endpoint
  • New models: whisper-v3-turbo at 0.0009/min,[gemini3flashaudio](/models/gemini3flashaudio)at0.0009/min, [`gemini-3-flash-audio`](/models/gemini-3-flash-audio) at 0.000648/min
  • Per-minute pricing - both endpoints bill in 1-minute increments, rounded up. 1-hour file: 0.054transcribe+0.054 transcribe + 0.039 understand = $0.093 total
  • New aliases - model: "transcribe" and model: "audio-understand" ride forward when underlying SKUs change
  • Audio rows now flow through the same V2 ledger as text - visible on /logs, /rankings, and admin scorecards
  • Companion CLI watch-cli - open-source orchestrator that gives any agent eyes and ears for any social video URL (~50x cheaper than full multimodal LLM analysis)

April 26, 2026

Video Generation - 5 new models

Five video models now live behind a single async endpoint.
  • New endpoint POST /v1/videos/generations - async, returns 202 with a job_id; poll GET /v1/jobs/{id} for the result
  • New models: kling-2.5-pro, kling-3-pro, kling-3-pro-audio, seedance-2-pro, seedance-2-fast
  • Per-second pricing - 0.0945to0.0945 to 0.410 per second of video, billed against actual clamped duration
  • Per-SKU duration caps - Kling 3 family and Seedance support up to 15s; Kling 2.5 stays at 10s
  • Hold-and-finalize billing - failures refund in full; idempotency keys supported end-to-end
  • T2V or I2V from a single endpoint - pass image_url to switch any video model into image-to-video mode

April 25, 2026

DeepSeek V4 — Pro and Flash

DeepSeek’s V4 lineup now live on Kyma. Both variants are MIT-licensed, MoE, with 1M context and native reasoning.
  • deepseek-v4-pro — 1.6T (49B active) flagship for top reasoning and complex coding. 2.35/2.35 / 4.70 per 1M.
  • deepseek-v4-flash — 284B (13B active) value tier. Same family behavior at the lowest V4 price. 0.19/0.19 / 0.38 per 1M.
  • 1M context window, 65K max output, tool calling and structured outputs supported on both.
  • deepseek-v3 stays available as the previous-gen stable baseline; older workloads do not need to migrate.

Image Generation — Week 1

Four image-generation models now live behind a single async endpoint.

April 23, 2026

API Reliability and Platform Changes

This release bundles billing and rate-limit changes, model/provider updates, agent/runtime improvements, dashboard and product polish, API behavior updates with a focus on user-visible improvements.
  • Per-model soft weight + latency tracking (#8)
  • Ship Kimi K2.6 + fix kyma-agent auto-update EACCES (#9)
  • /models nav link + response sanitization fix (#14)
  • Routing — broadened fallback chain for deepseek-v3 (#7)
  • Sync Kimi K2.6 across docs, landing pages, integrations (#11)

April 21, 2026

Product and Dashboard Updates

This release bundles billing and rate-limit changes, model/provider updates, agent/runtime improvements, dashboard and product polish, API behavior updates with a focus on user-visible improvements.
  • AI discoverability + models SEO + billing fixes (#3)
  • Public model detail — tabbed code blocks like dashboard
  • Tuned deepseek-v3 fallback chain for better latency
  • AI agent discoverability + public /models SEO pages (#2)
  • Admin_top_users RPC — power user analytics without row limit

April 19, 2026

Product and Dashboard Updates

This release bundles billing and rate-limit changes, model/provider updates, agent/runtime improvements, dashboard and product polish, API behavior updates with a focus on user-visible improvements.
  • Chat playground — searchable model picker, model ID copy, cost per message
  • Billing page redesign — Anthropic-style single input, price breakdown, auto top-up inputs
  • Billing page redesign — Anthropic-style layout
  • Billing page — card editing, invoice links, tier display, unified layout
  • Billing — inline card modal, invoice creation, estimated taxes, setup checkout

April 17, 2026

Agent, Install, and Runtime Improvements

This release bundles billing and rate-limit changes, model/provider updates, agent/runtime improvements, dashboard and product polish, API behavior updates with a focus on user-visible improvements.
  • Switch tier system from spend-based to deposit-based
  • Added a new fallback layer for qwen-3.6-plus
  • Correct pricing link to docs.kymaapi.com/pricing
  • Bump kyma-agent to 0.1.6 and ter bootstrap to 0.1.1
  • Qwen-3.6-plus routing tuned to serve the real closed-weight model

April 16, 2026

Kyma Agent v0.1.12 — KYMA.md context + MCP servers

  • KYMA.md body injection — project rules now prepend to every turn’s system prompt, not just the initial one. Precedence: ~/.kyma/agent/KYMA.md./KYMA.md./KYMA.local.md, with CLAUDE.md and AGENTS.md as project fallbacks. Edits apply on the next turn — no restart needed.
  • /init — scaffold KYMA.md in one step. Detects JS/TS, Python, Rust, Go, Ruby, PHP and their frameworks, then proposes a starter file with frontmatter (model, thinking) + Stack / Conventions / Key files / Agent behavior sections. Preview before write.
  • /mcp — Model Context Protocol servers — bring any MCP-compatible tool server into Kyma. Configure in ~/.kyma/agent/mcp.json (user) or ./.kyma/mcp.json (project). Subcommands: /mcp to list, /mcp enable <name>, /mcp disable <name>, /mcp test <name>. Tools register automatically at session start as mcp__<server>__<tool>.
  • /status consolidation — merged /doctor, /balance, /usage into a single /status. It now shows account, credits, tier limits, lifetime spend, session totals, API latency, local diagnostics, and MCP health in one view.
  • Update with npm install -g @kyma-api/agent@latest.

GLM family from Z.AI

  • 3 new models added, bringing the active catalog to 16 models:
    • glm-5.1 — flagship long-running coding agent, repo-scale engineering, 203K context
    • glm-4.5-air — cheap agentic bulk tasks, 131K context
    • glm-4.7-flash — cheap long-context throughput, 203K context
  • Implicit caching — 50% off on cache hits where the underlying infrastructure supports it.
  • Kyma Agent v0.1.11 — the /models slash command now lists GLM 5.1, GLM 4.5 Air, and GLM 4.7 Flash alongside the existing 9 curated models. Update with npm install -g @kyma-api/agent@latest.
  • Auto-failover — every GLM model has multi-layer fallbacks so requests keep flowing even if a backend is unavailable.

April 15, 2026

Kyma Agent v0.1.8

  • Fixed /doctor diagnostics in kyma after the 0.1.7 bridge release. The install flow stays the same: npm install -g @kyma-api/agent.
  • kyma-ter remains pinned to 0.1.7; this was a package-only hotfix for the bundled ESM diagnostics path.

April 11, 2026

Kyma CLI v0.3

  • Historical note: this entry describes the April 11 launch state. The current package is @kyma-api/agent, which installs both kyma and kyma-ter.
  • kyma command — At that time, install was described as npm install -g kyma-api, then just type kyma to start an interactive chat session from your terminal. The current install path is documented in Kyma Agent.
  • Interactive model picker — Type /model in chat to browse active models with arrow keys. Or /model deepseek-r1 to switch instantly.
  • Slash commands/model, /models, /balance, /clear, /help, /exit — manage your session without leaving the chat.
  • Pipe modecat error.log | kyma "fix this" or git diff | kyma "review this". Auto-detects non-TTY and outputs clean text.
  • Device code loginkyma login opens your browser, auto-fills the code, copies to clipboard. Supports Google OAuth.
  • JSON modekyma models --json for scripts and CI. Auto-quiet in non-TTY environments.

April 10, 2026

Higher Limits, Better Emails

  • 2.5x higher token limits — Free tier now gets 200K tokens/minute (was 80K) and 30 RPM (was 20). Your coding agents can run longer sessions without hitting walls. Tier 1+ also increased proportionally.
  • Transactional emails — You’ll now receive a welcome email on signup, a receipt after every purchase, and a heads-up when your balance is running low. Auto top-up failures also notify you immediately.
  • Model expansion — Added MiniMax M2.7, Nemotron 3 Super, Step 3.5 Flash, GLM 4.5 Air, Gemma 4 26B MoE at that time. Model grid reorganized into Recommended / Coding & Agents / Fast & Long Context categories.
  • Compare page — New /compare page with honest Kyma vs other LLM gateways and direct APIs comparison, including benchmark data and migration snippets.
  • Prompt caching as USP — 48% cache hit rate on heavy users, 23% average cost savings. Now highlighted on homepage.

April 8, 2026

Models & Pricing

  • DeepSeek V3 + R1 — added as primary models with multi-layer fallbacks. DeepSeek V3 is GPT-5 class quality. DeepSeek R1 is a reasoning model 96% cheaper than o1.
  • Expanded backend redundancy — additional infrastructure for DeepSeek, Llama, and Qwen models. Adds redundancy and lower latency.
  • 4 models disabled — removed a low-yield Qwen 3 235B variant (high failure rate), gpt-oss-20b, llama-3.1-8b, gemma-3-27b (superseded by better models).
  • Pricing audit — corrected 8 model prices. New principle: price at MAX(all infrastructure costs including fallbacks) x 1.35 markup. If a fallback is too expensive, we remove it rather than raise your price.
  • 19 active models at that time with multi-layer infrastructure redundancy.

April 7, 2026

⚡ Reliability & Performance

  • Auto-failover — if a model’s primary infrastructure is down, requests automatically retry on backup providers. Most failures are invisible to you.
  • Faster responses — reduced internal overhead by ~250ms per request through smarter caching.
  • Better rate limiting — rate limits are now shared across our infrastructure (no more inconsistent counts).
  • Fallback headers — responses include X-Kyma-Fallback: true when a backup was used, plus X-Kyma-Fallback-Layer: 1|2|3 indicating how deep the fallback went.
  • More models available — expanded backup infrastructure means models stay available even during provider outages.

April 4, 2026

🚀 Launch

  • Launch model set — Llama 3.3 70B, Llama 4 Scout, Qwen 3 32B/235B, Gemma 4, Kimi K2, GPT-OSS, Gemini
  • Dashboard — API keys, usage stats, playground, model browser
  • Google Sign-In — one-click login
  • OpenAI compatible — drop-in replacement for any OpenAI SDK
  • Tier-based rate limits: Tier 0 (free) = 30 RPM, up to Tier 4 = 300 RPM
  • Streaming support
  • Supabase PostgreSQL backend