Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.kymaapi.com/llms.txt

Use this file to discover all available pages before exploring further.

Available models

Model IDParametersContextSpeedBest For
gemma-4-31b31B128KMediumMultimodal, vision
gemma-4-26b-moe26B (MoE)128KFastEfficient inference
gemini-3-flash1MFastUltra-long context
gemini-2.5-flash1MFastLong context

Gemma vs Gemini

  • Gemma = open source, free inference on Google. Best for multimodal/vision tasks.
  • Gemini = proprietary, paid per token. Best for ultra-long context (1M tokens).

Recommendation

Use gemma-4-31b for multimodal tasks — supports image understanding. Free inference via Google. Use gemini-2.5-flash for long context — 1M token window covers entire codebases and books.
from openai import OpenAI

client = OpenAI(base_url="https://kymaapi.com/v1", api_key="ky-...")

# Vision / multimodal
response = client.chat.completions.create(
    model="gemma-4-31b",
    messages=[{"role": "user", "content": "Describe this image"}]
)

# Long context (up to 1M tokens)
response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[{"role": "user", "content": "Summarize this 200-page document: ..."}]
)

Model aliases

AliasResolves to
visiongemma-4-31b
long-contextgemini-2.5-flash
model="vision"        # → gemma-4-31b
model="long-context"  # → gemini-2.5-flash