DeepSeek V4 Flash

Overview

deepseek-v4-flash is the fast variant in DeepSeek’s V4 lineup (April 2026). 284B total / 13B active MoE, MIT license. Same V4 family behavior at a fraction of the cost — built for general work, coding, and long-context bulk tasks where you want value over flagship quality.

Specs

Field	Value
Model ID	`deepseek-v4-flash`
Best for	General work, coding, long context, value
Context window	1,000,000 tokens
Max output tokens	65,536
Input modalities	Text
Output modalities	Text
Tool calling	Yes
Structured outputs	Yes
Reasoning	Yes
Prompt caching	Yes
Speed	Fast
Cost band	Cheap
Release stage	Preview

Pricing

	Per 1M tokens
Input	$0.189
Output	$0.378

Use this when

You want strong V4-family behavior at the lowest possible price.
You need 1M context for long documents or large repos but don’t want flagship cost.
You’re running high-volume coding or extraction workloads.
You want a default cheap model that still handles tools and reasoning.

Pick something else when

You need the strongest V4 quality: use deepseek-v4-pro.
You want the best general default overall: use qwen-3.6-plus.
You want the previous-gen DeepSeek flagship at value pricing: use deepseek-v3.
You need vision / multimodal input: use gemma-4-31b or kimi-k2.6.

Example

from openai import OpenAI

client = OpenAI(base_url="https://kymaapi.com/v1", api_key="ky-...")

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Summarize the key tradeoffs in this RFC, then list the open questions."}]
)

Models

​Overview

​Specs

​Pricing

​Use this when

​Pick something else when

​Example