How It Works
- First request: Full prompt is processed and cached by the provider
- Subsequent requests: Cached prefix is reused — up to 90% cheaper and 80% faster
Automatic Caching
For OpenAI-compatible requests, caching is automatic when your prompt exceeds 1,024 tokens. Place static content (system prompt, tool definitions) at the beginning:Best Practices
Structure prompts for caching
Place stable content first, dynamic content last:For coding agents
Coding agents (OpenClaw, Cline, Roo Code) automatically benefit from caching because they send the same system prompt + tool definitions with every request. Typical savings for a 50-request coding session:- Without caching: 400K tokens × full price
- With caching: 47K effective tokens (88% savings)
What to avoid
- Don’t put timestamps or request IDs in system prompts — breaks cache
- Don’t reorder tool definitions between requests
- Keep system prompt identical across requests
Cache Stats in Response
Kyma normalizes cache statistics from all providers into a unified format:| Field | Description |
|---|---|
cached_tokens | Tokens read from cache (90% discounted) |
cache_write_tokens | Tokens written to cache on first request |
cost | Total cost charged for this request (USD) |
cache_discount | Amount saved from caching (USD) |
Supported Models
Checksupports_caching in the models endpoint:
- Groq — automatic caching for prompts >1,024 tokens
- Google AI — automatic caching, reduced rates
- OpenRouter — pass-through provider caching
Pricing
Cached tokens are charged at 10% of the normal input price (90% discount).| Token Type | Rate | Example (Llama 3.3 70B) |
|---|---|---|
| Input (non-cached) | Full price | $0.797 / 1M tokens |
| Input (cached) | 10% of input price | $0.0797 / 1M tokens |
| Output | Full price | $1.067 / 1M tokens |
usage.cost field in every response shows the actual amount charged, so you can track savings in real-time.