Overview
gemma-4-31b is the best low-cost multimodal option in Kyma’s active lineup. It is a practical choice when you need image input without moving to a more expensive agent model.
Specs
| Field | Value |
|---|---|
| Model ID | gemma-4-31b |
| Best for | Vision, multimodal tasks, cheap general work |
| Context window | 128K |
| Max output tokens | 8K |
| Input modalities | Text, image |
| Output modalities | Text |
| Tool calling | Yes |
| Structured outputs | Yes |
| Prompt caching | Yes |
| Speed | Medium |
| Cost band | Cheap |
| Release stage | Stable |
Use this when
- You need image input at a lower cost.
- You want a cheap model for extraction or analysis from screenshots/documents.
- You need a general model with decent context and structured outputs.
Pick something else when
- You want stronger flagship reasoning: use
qwen-3.6-plus. - You need stronger multimodal agent behavior: use
kimi-k2.5. - You need 1M context: use
gemini-2.5-flash.