Skip to main content

Best Model for This

ModelWhyCost per agent step
qwen-3-coderPurpose-built for code, best accuracy~$0.03
kimi-k2.5Best tool calling, 262K context~$0.05
minimax-m2.5SWE-bench 80.2%, top agentic coding~$0.04
Costs assume ~1K tokens input + ~500 tokens output per step.

Quick Start

import json
from openai import OpenAI

client = OpenAI(
    base_url="https://kymaapi.com/v1",
    api_key="ky-your-api-key"
)

tools = [{"type": "function", "function": {
    "name": "run_python", "description": "Execute Python code and return the output",
    "parameters": {"type": "object", "properties": {
        "code": {"type": "string"}}, "required": ["code"]}}}]

def run_python(code: str) -> str:
    import subprocess
    r = subprocess.run(["python3", "-c", code], capture_output=True, text=True, timeout=10)
    return r.stdout or r.stderr

messages = [
    {"role": "system", "content": "You are a coding assistant. Use run_python to test your code."},
    {"role": "user", "content": "Write a function to find all prime numbers up to 50."}
]
while True:
    response = client.chat.completions.create(
        model="qwen-3-coder", messages=messages, tools=tools, temperature=0)
    msg = response.choices[0].message
    messages.append(msg)
    if response.choices[0].finish_reason != "tool_calls":
        print(msg.content); break
    for call in msg.tool_calls:
        result = run_python(json.loads(call.function.arguments)["code"])
        messages.append({"role": "tool", "tool_call_id": call.id, "content": result})

Tips & Best Practices

  • Set temperature=0 — deterministic code generation reduces syntax errors and off-script behavior.
  • Include file context in system prompt — paste relevant file contents so the model knows your codebase structure.
  • All active models support tool calling — check supports_tools in /v1/models if adding new models.
  • Limit execution environment — sandbox run_python with a timeout and restricted imports for production agents.

Cost Estimate

Agent taskStepsModelCost
Write + test a function2-3 stepsqwen-3-coder~$0.06–0.09
Debug + fix a bug3-5 stepskimi-k2.5~$0.15–0.25
Implement a feature5-10 stepsminimax-m2.5~$0.20–0.40
Costs scale with context size — long files passed to the model increase input tokens significantly.

Next Steps