Skip to main content

Overview

gemini-3.5-flash is Google’s newest Flash-tier model on Kyma. It pairs a 1M-token context window with multimodal input (text, image, audio, and video) and fast reasoning, making it a strong pick for long-context and mixed-media workloads that still need low latency.

Specs

FieldValue
Model IDgemini-3.5-flash
Best forLong context, multimodal input, fast reasoning
Context window1M
Max output tokens8K
Input modalitiesText, image, audio, video
Output modalitiesText
Tool callingYes
Structured outputsYes
ReasoningYes
VisionYes
Prompt cachingYes
SpeedFast
Cost bandPremium
Release stageStable

Use this when

  • You need a very large (1M) context window for long documents, transcripts, or codebases.
  • Your input mixes text with images, audio, or video.
  • You want fast responses without dropping to a weaker model.
  • You build agents that need tool calling and structured JSON outputs.

Pick something else when

Example

from openai import OpenAI

client = OpenAI(base_url="https://kymaapi.com/v1", api_key="ky-...")

response = client.chat.completions.create(
    model="gemini-3.5-flash",
    messages=[
        {"role": "user", "content": "Summarize this 200-page contract and flag unusual clauses."}
    ]
)

Agent query example

curl "https://kymaapi.com/v1/models?recommended_for=long-context&vision=true&quality_tier=frontier-open"