TensorFlow Tutorial - Adversarial Noise for MNIST


Notes by Magnus Erik Hvass Pedersen: https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/12_Advers...

The previous Tutorial #11 showed how to find so-called adversarial examples for a state-of-the-art neural network, which caused the network to mis-classify images even though they looked identical to the human eye. For example, an image of a parrot became mis-classified as a bookcase when adding the adversarial noise, but the image looked completely unchanged to the human eye.

The adversarial noise in Tutorial #11 was found through an optimization process for each individual image. Because the noise was specialized for each image, it may not generalize and have any effect on other images.

In this tutorial we will instead find adversarial noise that causes nearly all input images to become mis-classified as a desired target-class. The MNIST data-set of hand-written digits is used as an example. The adversarial noise is now clearly visible to the human eye, but the digits are still easily identified by a human, while the neural network mis-classifies nearly all the images.

In this tutorial we will also try and make the neural network immune to adversarial noise.

Tutorial #11 used NumPy for the adversarial optimization. In this tutorial we will show how to implement the optimization process directly in TensorFlow. This might be faster, especially when using a GPU, because it does not need to copy data to and from the GPU in each iteration.

It is recommended that you first study Tutorial #11. You should also be familiar with TensorFlow in general, see e.g. Tutorials #01 and #02.

Resource Type: