Did you set that up following a guide or anything you could share?

PeterStuer · 2026-02-06T20:36:46 1770410206

Easiest way I know is to just use LMStudio. Just download and press play :). Optional, but recommended, increase the context length to 262144 if you have the DRAM available. It will definitely get slower as your interaction prolongs, but (at least for me) still tolerable speed.

mathrawka · 2026-02-06T20:00:45 1770408045

not OP, but I got it running on my 4090 (and RAM) by following this guide: https://unsloth.ai/docs/models/qwen3-coder-next

I see around 30 t/s