96-bit, and other non-power-of-two systems are cumbersome and error-prone to work with in C - which is often used when writing firmware for computer hardware.
The real risks of economic damage caused by bit-fiddling bugs is much greater than the risk of bringing the universe’s thermodynamic heat-death ever so slightly nearer...
I would think it's fairly easy to keep all calculations in C in 128-bit and just mask out the top few bytes when retrieving and storing. You could also argue that >64 bit values will be rare enough that they warrant their own code path if they are encountered as an optimization (perhaps they already do this?).
This is actually an argument against 128 bit, because it clearly shows that 128 bit are unreachable and thus a waste. What about 96 bit?