Skip to main content

Best Model for This

ModelWhyCost per 1K messages
qwen-3.6-plusBest quality, handles nuance well~$0.30
llama-3.3-70bFastest response, great all-rounder~$0.80
gemini-2.5-flashCheapest at scale, 1M context~$0.20
Costs assume ~300 tokens input + ~200 tokens output per message exchange.

Quick Start

from openai import OpenAI

client = OpenAI(
    base_url="https://kymaapi.com/v1",
    api_key="ky-your-api-key"
)

messages = [{"role": "system", "content": "You are a helpful assistant."}]

def chat(user_input: str) -> str:
    messages.append({"role": "user", "content": user_input})
    stream = client.chat.completions.create(
        model="qwen-3.6-plus",
        messages=messages,
        stream=True
    )
    response = ""
    for chunk in stream:
        content = chunk.choices[0].delta.content
        if content:
            print(content, end="", flush=True)
            response += content
    print()
    messages.append({"role": "assistant", "content": response})
    return response

chat("What is machine learning?")
chat("Can you give me a simple example?")

Tips & Best Practices

  • Stream always — users perceive streamed responses as 3-5x faster even at the same token speed.
  • Cap history length — trim messages to the last 10-20 turns or ~4K tokens to keep latency low and costs predictable.
  • System prompt sets personality — define tone, scope, and any constraints in the first system message.
  • Use qwen-3.6-plus for quality, llama-3.3-70b for speed — swap the model string; the code is identical.

Cost Estimate

VolumeModelMonthly cost
1K messages/dayqwen-3.6-plus~$9/month
1K messages/dayllama-3.3-70b~$24/month
10K messages/daygemini-2.5-flash~$60/month
Assumes 300 tokens input + 200 tokens output per exchange. Longer conversations cost more.

Next Steps