Skip to main content

Overview

gemma-4-31b is the best low-cost multimodal option in Kyma’s active lineup. It is a practical choice when you need image input without moving to a more expensive agent model.

Specs

FieldValue
Model IDgemma-4-31b
Best forVision, multimodal tasks, cheap general work
Context window128K
Max output tokens8K
Input modalitiesText, image
Output modalitiesText
Tool callingYes
Structured outputsYes
Prompt cachingYes
SpeedMedium
Cost bandCheap
Release stageStable

Use this when

  • You need image input at a lower cost.
  • You want a cheap model for extraction or analysis from screenshots/documents.
  • You need a general model with decent context and structured outputs.

Pick something else when

  • You want stronger flagship reasoning: use qwen-3.6-plus.
  • You need stronger multimodal agent behavior: use kimi-k2.5.
  • You need 1M context: use gemini-2.5-flash.

Example

from openai import OpenAI

client = OpenAI(base_url="https://kymaapi.com/v1", api_key="ky-...")

response = client.chat.completions.create(
    model="gemma-4-31b",
    messages=[{"role": "user", "content": "Describe the key UI problems in this screenshot."}]
)