This particular example is built on top of llama.cpp. It has a few benefits:
1. It's hopefully easier to install (though still not nearly easy enough)
2. All prompts you send through it - along with their responses - are automatically logged to a SQLite database. This is fantastic for running experiments and figuring out what kinds of things work.
3. The same LLM tool works for other models as well - you can run "llm -m $MODEL $PROMPT" against OpenAI models, Anthropic models, other self-hosted models, models hosted on Replicate - all handled by plugins, which should make it really easy to add support for other models too.
My ultimate goal with LLM is that when someone releases a new model it will quickly be supported by an LLM plugin, which should make it MUCH easier to install and run these things without having to figure out a brand new way of doing it every single time.
Also interesting observation about the prompts, I noticed too that I got much better performance when I didn't use Facebook's standard prompt.