For the life of me, I can't get the fetish with apple machines. I mean, I get they are built very well, and it's all top tier, but the return on dollar spent is very dubious
I have done my share of fine tuning on open source LLMs (e.g. Llama).
I'm surprised you have very poor generalization.
I assume you're using standard techniques, like lora/qlora, which might leave room for issues with your data.
Can you share more details on what is the format of your data points? like, Q/A, free text,...
I thank project Euler because whenever I face some coding challenge that has any mathematical inclination, I will(!) impress interviewers.
I spent a lot of time on it, and learned a considerable amount of theory and hacks. What a privilege to be able to do this, instead of digging the fields and other manual labor.
Thanks culture!
My post:
I used the Z3 SMT solver to test if two models are logically equivalent across the entire input space (not just in the sample data).
It either finds a counterexample or proves none exists.
To be considered when simplifying complex models or when retraining routines in mlops.
Post includes code and discussion.
Author here.
This post started as an experiment to evaluate embedding models and impact on retrieval from a geometric perspective.
I measured triangle inequality, local stability, and model compression effects.
Plots and code included.