Skip to main content

Best Model for This

ModelWhyCost per 1K extractions
qwen-3-32bFast, accurate JSON, low cost~$0.40
deepseek-v3Best for complex nested schemas~$0.75
llama-3.3-70bGood balance of speed + accuracy~$1.00
Costs assume ~400 tokens input + ~200 tokens output per extraction.

Quick Start

import json
from openai import OpenAI
from pydantic import BaseModel

client = OpenAI(
    base_url="https://kymaapi.com/v1",
    api_key="ky-your-api-key"
)

class Invoice(BaseModel):
    vendor: str
    amount: float
    currency: str
    date: str
    line_items: list[str]

SYSTEM = ('Extract invoice data as JSON. Schema: {"vendor": str, "amount": float, '
          '"currency": str, "date": "YYYY-MM-DD", "line_items": [str]}. JSON only.')

def extract_invoice(text: str) -> Invoice:
    resp = client.chat.completions.create(
        model="qwen-3-32b",
        messages=[{"role": "system", "content": SYSTEM}, {"role": "user", "content": text}],
        response_format={"type": "json_object"},
        temperature=0
    )
    return Invoice(**json.loads(resp.choices[0].message.content))

raw = "Invoice from Acme Corp, March 15 2025. 10x Widget A @ $5, 2x Widget B @ $12.50. Total $75 USD."
invoice = extract_invoice(raw)
print(f"Vendor: {invoice.vendor}, Amount: {invoice.currency}{invoice.amount}")

Tips & Best Practices

  • Always set temperature=0 — extraction is deterministic, not creative. Higher temperatures introduce variation in field names and values.
  • Always validate output — use Pydantic or Zod. Models occasionally miss optional fields or format dates differently.
  • Provide examples in the system prompt — one example of input + expected output dramatically improves accuracy on complex schemas.
  • Use response_format: json_object — guarantees JSON-parseable output, prevents markdown wrapping or prose before the JSON.

Cost Estimate

VolumeModelMonthly cost
10K extractions/dayqwen-3-32b~$12/month
10K extractions/daydeepseek-v3~$22/month
100K extractions/dayqwen-3-32b~$120/month
Assumes ~400 tokens input + ~200 tokens output per extraction. Token count scales with document length.

Next Steps