Skip to main content

Overview

gemini-2.5-flash is the best choice when context length is the main constraint. It gives you 1M context and multimodal input without going to a premium flagship route.

Specs

FieldValue
Model IDgemini-2.5-flash
Best forLong context, fast throughput, multimodal analysis
Context window1M
Max output tokens8K
Input modalitiesText, image, audio, video
Output modalitiesText
Tool callingYes
Structured outputsYes
Prompt cachingYes
SpeedFast
Cost bandCheap
Release stageStable

Use this when

  • You need extremely long context.
  • You want a cheap long-context model for analysis or extraction.
  • You need multimodal input across more than just images.

Pick something else when

  • You want the best default quality: use qwen-3.6-plus.
  • You need stronger agentic coding behavior: use kimi-k2.5.
  • You need deeper reasoning than throughput: use deepseek-r1.

Example

from openai import OpenAI

client = OpenAI(base_url="https://kymaapi.com/v1", api_key="ky-...")

response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[{"role": "user", "content": "Summarize this long document set and extract the key decisions."}]
)