Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fabrice Bellard has run a standard set of benchmarks w/ lm-eval on a big chunk of open models here: https://bellard.org/ts_server/ - Flan T5 XXL and GPT-NeoX 20B both outperform Pythia 12B on average (LLaMA 13B+ tops the charts).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: