Build a Chatbot

Best Model for This

Model	Why	Cost per 1K messages
`qwen-3.6-plus`	Best quality, handles nuance well	~$0.30
`llama-3.3-70b`	Fastest response, great all-rounder	~$0.80
`gemini-2.5-flash`	Cheapest at scale, 1M context	~$0.20

Costs assume ~300 tokens input + ~200 tokens output per message exchange.

Quick Start

from openai import OpenAI

client = OpenAI(
    base_url="https://kymaapi.com/v1",
    api_key="ky-your-api-key"
)

messages = [{"role": "system", "content": "You are a helpful assistant."}]

def chat(user_input: str) -> str:
    messages.append({"role": "user", "content": user_input})
    stream = client.chat.completions.create(
        model="qwen-3.6-plus",
        messages=messages,
        stream=True
    )
    response = ""
    for chunk in stream:
        content = chunk.choices[0].delta.content
        if content:
            print(content, end="", flush=True)
            response += content
    print()
    messages.append({"role": "assistant", "content": response})
    return response

chat("What is machine learning?")
chat("Can you give me a simple example?")

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://kymaapi.com/v1",
  apiKey: "ky-your-api-key",
});

const messages = [{ role: "system", content: "You are a helpful assistant." }];

async function chat(userInput) {
  messages.push({ role: "user", content: userInput });
  const stream = await client.chat.completions.create({
    model: "qwen-3.6-plus",
    messages,
    stream: true,
  });
  let response = "";
  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
      process.stdout.write(content);
      response += content;
    }
  }
  console.log();
  messages.push({ role: "assistant", content: response });
  return response;
}

await chat("What is machine learning?");
await chat("Can you give me a simple example?");

Tips & Best Practices

Stream always — users perceive streamed responses as 3-5x faster even at the same token speed.
Cap history length — trim messages to the last 10-20 turns or ~4K tokens to keep latency low and costs predictable.
System prompt sets personality — define tone, scope, and any constraints in the first system message.
Use qwen-3.6-plus for quality, llama-3.3-70b for speed — swap the model string; the code is identical.

Cost Estimate

Volume	Model	Monthly cost
1K messages/day	`qwen-3.6-plus`	~$9/month
1K messages/day	`llama-3.3-70b`	~$24/month
10K messages/day	`gemini-2.5-flash`	~$60/month

Assumes 300 tokens input + 200 tokens output per exchange. Longer conversations cost more.

Next Steps

Streaming — deeper streaming patterns
Model Aliases — use best, fast, balanced shortcuts
Prompt Caching — cache system prompts to cut costs by up to 90%

Guides

Use Cases

Integrations

Kyma Tools

Best Model for This

Quick Start

Tips & Best Practices

Cost Estimate

Next Steps

​Best Model for This

​Quick Start

​Tips & Best Practices

​Cost Estimate

​Next Steps

Best Model for This

Quick Start

Tips & Best Practices

Cost Estimate

Next Steps