Skip to main content

Available models

Model IDParametersContextBest For
llama-3.3-70b70B128KGeneral, code, reasoning
llama-4-scout 🔥17B (MoE)512KLong documents
llama-3.1-8b8B8KFast simple tasks
llama-3.1-8b-cerebras8B8KUltra-fast inference

Recommendation

Start with llama-3.3-70b — it’s the most popular and highest quality Llama model. Use llama-4-scout when you need 512K context.
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello!"}]
)