Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kymaapi.com/llms.txt

Use this file to discover all available pages before exploring further.

Pricing & access

Yes to start. You get 0.50freecreditsonsignup.Nocreditcardrequired.ThatsenoughtotryKymaonlowercostmodels.Forheavierusage,buycreditsstartingat0.50 free credits on signup. No credit card required. That's enough to try Kyma on lower-cost models. For heavier usage, buy credits starting at 5.
You’ll get a 429 response. Wait a few seconds and retry. Limits are tier-based: Tier 0 = 30 RPM, up to Tier 4 = 300 RPM. See Rate Limits.
Yes. Rate limits increase automatically as you purchase credits. Tier 1 unlocks at $10 purchased (60 RPM). See Rate Limits for all tiers.

Models & capabilities

Start with qwen-3.6-plus if you want the safest default. Use kimi-k2.6 for tool-heavy agents, deepseek-r1 for hard reasoning, and gemini-2.5-flash for long-context tasks. See Which model should I use? for the full decision guide.
Yes. All models served by Kyma are open-source or open-weight models from Meta, Alibaba, Google, OpenAI, and others. Kyma itself is a managed service — not open source.
Yes. gemma-4-31b is the cheapest strong multimodal option, and kimi-k2.6 is the better pick if you also need tool-heavy agent behavior.
Yes. Kyma serves four image-generation models — flux-1.1-ultra (cinematic photo), flux-kontext-pro (image edit), ideogram-v3 (typography / logos), and recraft-v3 (vector / illustration) — through async POST /v1/images/generations. Pricing is per-image, 0.054to0.054 to 0.108 depending on model. See the Image Generation guide for prompts and examples.
Up to 1M tokens with Gemini models. Many other active models support 128K-262K. Check GET /v1/models for the live catalog and context windows.

Technical limits

No. Kyma does not impose any output token cap on the gateway. The max_tokens parameter from your request is forwarded directly to the model. The max_output_tokens value shown in /v1/models reflects the model creator’s published specification, not a Kyma restriction. You can verify this via GET /v1/capabilities.
No. Kyma serves the exact same model weights published by the creator (Alibaba, Google, DeepSeek, Meta, etc.). The models run on high-performance inference infrastructure. Kyma does not fine-tune, quantize, or modify any model.
Three common causes:
  1. You didn’t set max_tokens — defaults to 4096 tokens. Set it higher for longer outputs (e.g., max_tokens: 8192).
  2. Model’s output limit — each model has a maximum output capacity (check max_output_tokens in /v1/models). For long generation, use models like glm-5.1 (65K) or deepseek-r1 (32K).
  3. finish_reason: length — the model hit its limit naturally. Increase max_tokens or use a model with higher output capacity.
Kyma never truncates output. If finish_reason is length, the model itself stopped.
4096 tokens. Always set max_tokens explicitly in your request if you need longer outputs. Kyma forwards your value directly to the model.
Call GET /v1/models — each model includes max_output_tokens, context_window, gateway_output_limit (always null), and max_tokens_passthrough (always true). For a gateway-level overview, call GET /v1/capabilities.

Compatibility

Yes. Just change base_url to https://kymaapi.com/v1 and use your ky- API key. All OpenAI SDK features work.
Yes. Kyma supports the Anthropic Messages API at /v1/messages. See our Anthropic guide.
Yes. Use the OpenAI provider in LangChain with Kyma’s base URL.
Yes. Configure Kyma as an OpenAI-compatible provider with your ky- API key and https://kymaapi.com/v1 as the base URL.
The official Google Antigravity IDE does not support custom API endpoints — it only works with Gemini, Claude Sonnet, and GPT-OSS via Google’s auth. However, you can use the community open-antigravity fork which supports custom OpenAI-compatible endpoints like Kyma. See our Antigravity guide for setup instructions.

Reliability

Kyma automatically retries your request on backup infrastructure. Most outages are invisible to you — your request succeeds with the same model from a different source. If the exact model is unavailable everywhere, Kyma can substitute an equivalent-quality model. Check the X-Kyma-Fallback response header to know if a fallback was used.
When a fallback occurs, these headers are included in the response:
  • X-Kyma-Fallback: true — a fallback was used
  • X-Kyma-Fallback-Layer: 1|2|3 — 1 = same model different source, 3 = different model
Use X-Kyma-Fallback-Layer to decide whether to retry. Layer 1 means you got the same model you asked for; Layer 3 means an equivalent-quality substitute.
Kyma runs on redundant infrastructure with auto-failover. Individual provider outages don’t affect you because requests are automatically rerouted. Check Status for real-time model availability.

API keys & account

Keys start with ky-. You can create multiple keys in the Dashboard. Each key shares the same account balance and rate limits.
Kyma does not store your prompts or responses. We only log metadata (model, latency, tokens) for usage tracking.
Yes. Higher tiers unlock automatically as you purchase credits. Tier 1+ gets 60–300 RPM, credits never expire, and you can switch models without changing your integration.