Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That rule of thumb is only related to 8 bit quants at low context. The default for ollama is 4 bit, which puts it roughly about 14GB.

The vast majority of people run between 4-6 bit depending on system capability. The extra accuracy above 6 tends to not be worth it relative to the performance hit.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: