Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've had the opposite experience with Mixtral on Ollama, on an intel linux box with a 4090. It's weirdly slow. But I suspect there's something up with ollama on this machine anyway, any model I run with it seems to have higher latency than vLLM on the same box.


You have to specify the amount of layers to put on the GPU with ollama. Ollama defaults to far less layers compared to what is actually possible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: