How do you quantify the confidence of your model? Do you use a Bayesian model or just the log-likelihood? Because the latter can act strangely in some cases.
I know this is a digression from the current discussion on how well the devices work, but as a stats student who just learned about estimating using log-likelihoods, could you give some more info on how that is inferior to the Bayesian model (since I've heard the exact opposite is true)?
The problem is that neural networks trained using maximum LL do not return calibrated probabilities, using e.g. the softmax output as 'confidence' of a model tends to result in overconfident predictions, take a look at adversarial attacks on neural networks for an extreme example: https://blog.openai.com/adversarial-example-research/