Skip to main content

Best Model

Gemini 2.5 Flash (gemini-2.5-flash) — 1M context window for large knowledge bases. ~$0.88 per 1K requests. For faster responses: Qwen 3.6 Plus (qwen-3.6-plus) at ~$0.75 per 1K requests.

Python — Knowledge Base Copilot

from openai import OpenAI

client = OpenAI(base_url="https://kymaapi.com/v1", api_key="ky-your-key")

# Load your knowledge base (docs, wiki, runbooks)
knowledge_base = """
## Deployment Process
1. Push to main branch
2. CI runs tests (5 min)
3. Auto-deploy to staging
4. Manual approval for production

## Common Issues
- 503 errors: Check pod health with `kubectl get pods`
- Slow queries: Check DB connections with `pg_stat_activity`
- Auth failures: Verify JWT expiry in Redis
"""

def ask_copilot(question: str, context: str = knowledge_base) -> str:
    response = client.chat.completions.create(
        model="gemini-2.5-flash",
        messages=[
            {"role": "system", "content": f"You are an internal engineering copilot. Answer questions using this knowledge base:\n\n{context}\n\nIf the answer isn't in the knowledge base, say so."},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

print(ask_copilot("How do I deploy to production?"))
print(ask_copilot("We're getting 503 errors, what should I check?"))

JavaScript — Slack Bot Copilot

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://kymaapi.com/v1",
  apiKey: process.env.KYMA_API_KEY,
});

async function handleSlackMessage(question: string, docs: string): Promise<string> {
  const response = await client.chat.completions.create({
    model: "gemini-2.5-flash",
    messages: [
      {
        role: "system",
        content: `You are a helpful internal assistant. Answer based on these docs:\n\n${docs}\n\nBe concise. Use bullet points.`,
      },
      { role: "user", content: question },
    ],
  });
  return response.choices[0].message.content!;
}

// In your Slack bot handler:
// const answer = await handleSlackMessage(event.text, companyDocs);
// await slack.chat.postMessage({ channel: event.channel, text: answer });

With RAG (Retrieval-Augmented Generation)

For larger knowledge bases, retrieve relevant chunks first:
from openai import OpenAI

client = OpenAI(base_url="https://kymaapi.com/v1", api_key="ky-your-key")

def copilot_with_rag(question: str, retrieved_chunks: list[str]) -> str:
    context = "\n\n---\n\n".join(retrieved_chunks)

    response = client.chat.completions.create(
        model="gemini-2.5-flash",
        messages=[
            {"role": "system", "content": f"Answer based on these documents:\n\n{context}"},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

# Use with your vector DB (Pinecone, Weaviate, pgvector, etc.)
chunks = ["Relevant doc 1...", "Relevant doc 2..."]
print(copilot_with_rag("How does auth work?", chunks))

Tips

  • Use gemini-2.5-flash for large contexts (1M tokens = entire codebases/wikis)
  • Instruct the model to say “I don’t know” when info isn’t in the knowledge base
  • Add conversation history for follow-up questions
  • Cache frequent questions with Redis to reduce costs

Cost Estimate

ScenarioTokensModelCost
Short Q&A (small context)2K in / 200 outqwen-3.6-plus~$0.001
RAG query (5 chunks)5K in / 500 outgemini-2.5-flash~$0.004
Full wiki context50K in / 500 outgemini-2.5-flash~$0.02

Next Steps