Quick picks
| Use case | Model | Why |
|---|---|---|
| Default / general | llama-3.3-70b | Best quality + speed balance |
| Coding | qwen-3-32b | Top coding benchmarks |
| Long documents | llama-4-scout | 512K context window |
| Fastest response | llama-3.1-8b | Lowest latency |
| Highest quality | qwen-3-235b-cerebras | 235B parameters |
Switching models
Just change themodel parameter — everything else stays the same: