2026-07-02 · Thamodharan Ganesan
TextMate AI isn't locked to a single model. Every chat lets you pick from 8 models served via NVIDIA NIM, three of which are available on the Free plan.
| Model | Best for |
|---|---|
| GPT-OSS 120B (default) | Balanced, fast, general-purpose — the right default for most conversations. Verified ~131K context window. |
| GPT-OSS 20B | A lighter, faster variant of GPT-OSS for quick questions where you don't need the full 120B model. |
| Mistral Small 4 | The fastest response times in the picker — good for short back-and-forth chat. |
| Model | Best for |
|---|---|
| Mistral Large 3 | The largest model available — highest output quality for complex requests. |
| Llama 4 Maverick | Fast and accurate, a strong middle ground between speed and depth. |
| GLM 5.1 | Strong reasoning — a good pick for multi-step logic or analysis. |
| Qwen 3.5 122B | A balanced Qwen model, useful as an alternative "second opinion" model. |
| Kimi K2.6 | Built for long-context reasoning — pick this when you're working with a lot of prior conversation or a long document. |
Only the GPT-OSS models (120B and 20B) expose a thinking-depth control. When a question needs deeper multi-step reasoning, TextMate AI can raise GPT-OSS's reasoning effort rather than switching models entirely — this is why GPT-OSS 120B is the default: it's the one model in the picker that scales its own depth of thought on demand.
Every model has a per-response output cap. GPT-OSS 120B and 20B support up to 120,000 output tokens in a single response — enough for very long rewrites or documents. Llama 4 Maverick caps at 16,384 tokens. The remaining Pro models use a conservative 8,192-token cap. If you're generating long-form content in one shot, GPT-OSS is the model built for that.
See the Models page for the live picker with plan requirements, or Pricing for what Free vs Pro unlocks.