I've had the opposite experience with Mixtral on Ollama, on an intel linux box w... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		regularfry on Feb 9, 2024 \| parent \| context \| favorite \| on: OpenAI compatibility I've had the opposite experience with Mixtral on Ollama, on an intel linux box with a 4090. It's weirdly slow. But I suspect there's something up with ollama on this machine anyway, any model I run with it seems to have higher latency than vLLM on the same box.

kkzz99 on Feb 9, 2024 [–]

You have to specify the amount of layers to put on the GPU with ollama. Ollama defaults to far less layers compared to what is actually possible.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact