Documentation Index
Fetch the complete documentation index at: https://docs.kymaapi.com/llms.txt
Use this file to discover all available pages before exploring further.
Best Model for This
| Model | Why | Cost per query |
|---|---|---|
gemini-2.5-flash | 1M context — load entire codebases | ~$0.009 |
qwen-3-32b | Fast synthesis for short contexts | ~$0.004 |
deepseek-v3 | Best reasoning over complex documents | ~$0.007 |
Quick Start
Tips & Best Practices
- Use
gemini-2.5-flashfor large documents — its 1M token context window can hold an entire codebase or book. Skip chunking for smaller corpora. - Enable prompt caching for repeated context — if the same document is queried multiple times, caching cuts input cost by 90%. See Prompt Caching.
- Be explicit about citation style — asking the model to cite
[1],[2]reduces hallucination and makes answers verifiable. - Instruct the model to say “I don’t know” — without this, models will confabulate answers from training data even when context is insufficient.
Cost Estimate
| Volume | Context size | Model | Monthly cost |
|---|---|---|---|
| 1K queries/day | 2K tokens | qwen-3-32b | ~$4/month |
| 1K queries/day | 10K tokens | gemini-2.5-flash | ~$18/month |
| 1K queries/day | 50K tokens | gemini-2.5-flash | ~$65/month |
Next Steps
- Prompt Caching — up to 90% discount on repeated context
- Streaming — stream RAG answers for faster perceived latency
- Models — compare context windows across all active models