What I would like to understand better is how transferable these adversarial images are from a network to another. I mean is the adversity "deep", like practically all models built on resnet18 (or even completely different models) suffer from the adveristy? Or is it "shallow", meaning just one more training round will mess up the special "gradient paths" found for these particular network weights?
The adversity is not only "deep" in the sense that resnet are affected, it seems to be universal and intrinsically tied to accuracy of current classifiers. For reference look for "Empirically estimating concentration of measures: Fundamental limits to robustness", "Adversarial Examples are features, not bugs" and the work on transferability by Papernot, Athalye, Madry and others
Nicholas Papernot [1] et al. has shown how this can be an attack on an unknown network provided you have access to inference results produced by the target network [2].
That is a good question. I'm not aware of someone exploring this avenue, but it would be interesting to see what categories of models are fooled by what types of noise.
"Noise" is the wrong way to think about these IMO, while robustness to noise implies robustness to a degree of perturbations, the perturbations have clearly outlined "directions" which can be visualised via so called "church plots". Seyed-Mohsen Moosavi-Dezfooli http://smoosavi.me/ and the lab of Prof. Frossard have done amazing work exploring this.
> Not worried enough? Imagine an attacker who manipulates road signs in a way such that self-driving cars will break traffic rules.
If someone was manipulating a bunch of road signs for bad, human drivers would probably be in trouble too..I guess the point is that these attacks could be more subtle/scalable?
More on this point: computer vision tasks are (currently) far more reliant on the presence of a number of identifying features in the high frequency details of images.
The are looking for a significant subset of some identifying group of highly localised features:
think a large number of small things, rather than a small number of large things;
think colour gradients rather than colours.
These sorts of high frequency pieces of information can be placed into images in a way that is imperceivable to humans, but screams at computer vision neural networks.
A sign could say no right turn to people and no left turn to machines.
This confirms my opinion that the only way for self driving cars to work is for every car on the road to be a self-driving car. Stop signs would become like train semaphores where rather than being purely visual, there are other electronic indicators that tell the car to stop at a certain point. The roads should then be more like a train network. I really think this will mean more efficient use of space since you can have shorter following distance (this was achieved on the train network too once automated systems took over from human drivers). Trying to blend self-driving and human driven cars is just going to cause a huge mess. Accounting for human mistakes and AI mistakes is just too big a problem for AI to solve right now, and imo, for a very long time.
And at this stage, at least in the context of cities, cars begin to make very little sense compared to tried and true alternatives like trains and subways. Combined with people-first city planning, there really is no need for so many cars in urban areas.
I find adversarial attack super interesting. It opens up a field in machine learning where people questions the robustness of models. Adversarial examples exist because of many reasons, with data and weight space being high-dimensional as one.
IMO, the world needs to pay more attention to this, as we implement more and more AI applications. What worries me more is that we still don't really have a good solution to his problem. Pretty much every model that is defended can be attacked in another way.
My guess for now is that we would have to go all the way back, and question about the assumptions we are making about high dimensional spaces, and use a more rigorous way to prove theorems and lemma about AI systems, rather than experimentally saying the accuracy is high enough therefore it works.
It's hard for me to imagine how an adversarial machine learning attack would work in practice. Sure, you might be able to fool Shazam, but it's not like Tesla or Waymo are going to give you free access to their models.