Apple's on-device ML at least proves it's possible even if the FOSS world would take a while to catch up. A bigger problem may getting access to training data.
It's only possible to a limited extent. A low power smartphone will simply never be able to do as much ML as a high power server, so there will always be a capability asymmetry here. And that's not even getting into the matter of FOSS drivers for such hardware.
And as you say, training these models is a different matter than merely using them. The data and power requirements involved are a huge problem.