Gemini 3.5 Flash - Kyma API

Overview

Gemini 3.5 Flash is the newest Flash-class model from Google, built for long context, multimodal input, and fast reasoning. It sits in Kyma’s frontier-open quality tier and takes text, images, audio, and video as input while returning text. On Kyma it runs through the same OpenAI-compatible endpoint as every other model, with automatic failover if a serving path degrades. Prompt caching is supported, so repeated prompt prefixes — long system prompts, large documents you query more than once — bill at 10% of the input rate. Every response reports its exact cost in usage.cost. The 1M-token context window is the headline capability: whole codebases, long transcripts, or large document sets fit in a single request. Function calling, structured outputs, and reasoning support round it out for agent work, not just one-shot prompts.

Specs

Field	Value
Model ID	`gemini-3.5-flash`
Best for	Long context, multimodal, fast reasoning
Context window	1.048576M
Max output tokens	8K
Input modalities	Text, Image, Audio, Video
Output modalities	Text
Tool calling	Yes
Structured outputs	Yes
Prompt caching	Yes
Speed	Fast
Cost band	Premium
Release stage	Stable

Pricing

	Per 1M tokens
Input	$2.02
Output	$12.15

Use this when

Video and audio analysis — Feed it video or audio directly — summarize recordings, extract what was said and shown, no separate transcription step.
Whole-corpus questions — The 1M context takes an entire codebase, contract set, or research archive in one request instead of a chunked retrieval pipeline.
Vision pipelines — Image understanding with structured outputs turns screenshots, documents, and photos into clean JSON your code consumes.
Fast reasoning at scale — A fast-tier model that also supports reasoning — multi-step analysis without leaving the fast tier.
Multimodal agents — Tool calling plus four input modalities lets one agent handle text, screenshots, and recordings without switching models.

Not ideal for

Long single-shot generations — output caps at 8K tokens — or cost-sensitive bulk text work, where its premium pricing buys multimodal range you wouldn’t be using.

Pick something else when

You want the cheapest long-context option: use gemini-2.5-flash.
You need the strongest tool-heavy agent behavior: use kimi-k2.6.
You want top reasoning over speed: use deepseek-v4-pro.

Example

from openai import OpenAI

client = OpenAI(base_url="https://kymaapi.com/v1", api_key="ky-...")

response = client.chat.completions.create(
    model="gemini-3.5-flash",
    messages=[{"role": "user", "content": "..."}],
    tools=[...],  # function calling supported
)

Agent query example

Ask the API which models fit, instead of hardcoding an id:

curl "https://kymaapi.com/v1/models?recommended_for=chat&tools=true&quality_tier=frontier-open"

FAQ

Is Gemini 3.5 Flash good for video understanding? Yes — video is a first-class input alongside audio, images, and text. You can send a recording and ask questions about what happens in it directly, which collapses the usual transcribe-then-summarize pipeline into one call. How does prompt caching change the economics here? Gemini 3.5 Flash sits in Kyma’s premium cost tier, and caching is where that gets manageable: repeated prompt prefixes — a long system prompt, a document you query more than once — bill at 10% of the input rate, a 90% discount on the cached portion. Every response reports its exact cost in usage.cost, so you can watch the discount land per request. Why run Gemini 3.5 Flash through Kyma? One API key and one OpenAI-compatible endpoint (base_url https://kymaapi.com/v1) covers this and every other model on the platform. You get automatic failover when a serving path degrades, exact per-request cost in usage.cost, and $0.50 of free credit to test with — no card required.

​Overview

​Specs

​Pricing

​Use this when

​Not ideal for

​Pick something else when

​Example

​Agent query example

​FAQ