Documentation Index
Fetch the complete documentation index at: https://docs.kymaapi.com/llms.txt
Use this file to discover all available pages before exploring further.
Best Model
Gemini 2.5 Flash (gemini-2.5-flash) — 1M context window for large knowledge bases. ~$0.88 per 1K requests.
For faster responses: Qwen 3.6 Plus (qwen-3.6-plus) at ~$0.75 per 1K requests.
Python — Knowledge Base Copilot
from openai import OpenAI
client = OpenAI(base_url="https://kymaapi.com/v1", api_key="ky-your-key")
# Load your knowledge base (docs, wiki, runbooks)
knowledge_base = """
## Deployment Process
1. Push to main branch
2. CI runs tests (5 min)
3. Auto-deploy to staging
4. Manual approval for production
## Common Issues
- 503 errors: Check pod health with `kubectl get pods`
- Slow queries: Check DB connections with `pg_stat_activity`
- Auth failures: Verify JWT expiry in Redis
"""
def ask_copilot(question: str, context: str = knowledge_base) -> str:
response = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[
{"role": "system", "content": f"You are an internal engineering copilot. Answer questions using this knowledge base:\n\n{context}\n\nIf the answer isn't in the knowledge base, say so."},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
print(ask_copilot("How do I deploy to production?"))
print(ask_copilot("We're getting 503 errors, what should I check?"))
JavaScript — Slack Bot Copilot
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://kymaapi.com/v1",
apiKey: process.env.KYMA_API_KEY,
});
async function handleSlackMessage(question: string, docs: string): Promise<string> {
const response = await client.chat.completions.create({
model: "gemini-2.5-flash",
messages: [
{
role: "system",
content: `You are a helpful internal assistant. Answer based on these docs:\n\n${docs}\n\nBe concise. Use bullet points.`,
},
{ role: "user", content: question },
],
});
return response.choices[0].message.content!;
}
// In your Slack bot handler:
// const answer = await handleSlackMessage(event.text, companyDocs);
// await slack.chat.postMessage({ channel: event.channel, text: answer });
With RAG (Retrieval-Augmented Generation)
For larger knowledge bases, retrieve relevant chunks first:
from openai import OpenAI
client = OpenAI(base_url="https://kymaapi.com/v1", api_key="ky-your-key")
def copilot_with_rag(question: str, retrieved_chunks: list[str]) -> str:
context = "\n\n---\n\n".join(retrieved_chunks)
response = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[
{"role": "system", "content": f"Answer based on these documents:\n\n{context}"},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
# Use with your vector DB (Pinecone, Weaviate, pgvector, etc.)
chunks = ["Relevant doc 1...", "Relevant doc 2..."]
print(copilot_with_rag("How does auth work?", chunks))
Tips
- Use
gemini-2.5-flash for large contexts (1M tokens = entire codebases/wikis)
- Instruct the model to say “I don’t know” when info isn’t in the knowledge base
- Add conversation history for follow-up questions
- Cache frequent questions with Redis to reduce costs
Cost Estimate
| Scenario | Tokens | Model | Cost |
|---|
| Short Q&A (small context) | 2K in / 200 out | qwen-3.6-plus | ~$0.001 |
| RAG query (5 chunks) | 5K in / 500 out | gemini-2.5-flash | ~$0.004 |
| Full wiki context | 50K in / 500 out | gemini-2.5-flash | ~$0.02 |
Next Steps