This is a really exciting development. They’re matching Qwen 2.5 32B on 1/3 the compute budget.
> Refined post-training and RLVR: Our models integrate our latest breakthrough in reinforcement learning with verifiable rewards (RLVR) as part of the Tülu 3.1 recipe by using Group Relative Policy Optimization (GRPO) and improved training infrastructure further enhancing their capabilities.
I only recently discovered all the work AI2 put out with Tülu 3, really laying out all of the components that make up a state-of-the-art post-training data mix. Very interesting stuff!
Awesome to see great work from AI2 continuing. They are the only competitive fully open source model as far as I know - they share the training data and code as well. They also recently released an open source app that does on device AI on your phone!
> Refined post-training and RLVR: Our models integrate our latest breakthrough in reinforcement learning with verifiable rewards (RLVR) as part of the Tülu 3.1 recipe by using Group Relative Policy Optimization (GRPO) and improved training infrastructure further enhancing their capabilities.
I only recently discovered all the work AI2 put out with Tülu 3, really laying out all of the components that make up a state-of-the-art post-training data mix. Very interesting stuff!
https://allenai.org/blog/tulu-3-technical