This is my first post here so any feedback would be very appreciated. Thank you :)
Dreams have always been a subject of fascination for me. Where do they come from? How is it that they always get interrupted at the juiciest bits? Why is it that sometimes they are so... distorted? Can we explore this through code?
Neural networks have been used for a long time to understand and attempt at finding a representation of the world around us, as seen through the eyes of your favorite metal box. How is this related to dreaming? I personally believe that dreams are but a representation of the way we see the world. The brain's way of storing information in terms of visuals and connections we do not yet understand.
Do Androids Dream of Electric Sheep? Play around with this code, maybe you will find out
This article is a tutorial for implementing an algorithm called Deep Dream. It was first published by Google in an attempt to understand the way Neural Networks learn from images and perceive the world. As a high-level overview, a pre-trained network that has the ability to classify and understand a large number of images is taken and then given a random image. It then attempts to recreate the image from scratch. In the process, we learn a bit more about what goes on inside the black box that is a Neural Network.
Note that this is just a simple implementation and the rest of the explanation and the code can be found on my repository here. I decided this to keep this blog accessible to everyone who lands here. DeepDream repo
I would advise you to follow along with the code. And play around with it as you go. That's the best way to understand it better. Feel free to ask me any doubts you have or give me your feedback. Here we go.
First, we go ahead and import what we need.
- We first download the image
- Resize it for faster computation
url = 'https://nicolekessler.files.wordpress.com/2013/04/hellish_demons.jpg?w=1024' def download(url, max_dim=None): name = "demons.jpg" image_path = tf.keras.utils.get_file(name, origin=url) img = PIL.Image.open(image_path) if max_dim: img.thumbnail((max_dim, max_dim)) return np.array(img)
def deprocess(img): img = 255 * (img + 1.0) / 2.0 return tf.cast(img, tf.uint8)
- just a wrapper to convert the tensor into an array and display
def show(img): display.display(PIL.Image.fromarray(np.array(img))) original_img = download(url, max_dim=500) show(original_img)
The output I get is the image that I have imported.
We then go to the step of importing the pre-trained network as mentioned earlier.
- We use Inception Net v3 which is a pretrained network that already has some idea of the world.
- We use imagenet weights which basically allows us to use transfer learning on the network
- Instead of training from scratch we can just cherry pick layers and use our neural network on it
base_model = tf.keras.applications.InceptionV3(include_top=False, weights='imagenet')
- We now choose two layers mixed3 and mixed5 from the inception pretrained network. The layers list will allow us to use these names and choose them from the model
- We then create a model with the base model (Inception) as input and the layers as output
names = ['mixed3', 'mixed5'] layers = [base_model.get_layer(name).output for name in names] dream_model = tf.keras.Model(inputs=base_model.input, outputs=layers)
The next step is to calculate the loss function which will allow us to proceed with the rest of the code.
- We take the image and the model as inputs
- Expand dims basically adds an extra dimension to our input along the x axis to make it work with inception
- For every activation in our layers, we calculate the loss and append it to a list
- reduce_mean() and reduce_sum() are approximately the mean and sum equivalent for tensors instead of just plain arrays
- Thus the sum is the total loss we get
def calc_loss(img, model): img_batch = tf.expand_dims(img, axis=0) layer_activations = model(img_batch) losses =  for act in layer_activations: loss = tf.math.reduce_mean(act) losses.append(loss) return tf.reduce_sum(losses)
Now we have to define the main class of our code. This part would be a little complicated for many of you so make sure you read it a bit carefully.
- The @tf.function allows the function to be precompiled. Since it is compiled, it runs faster
- Tensorspec basically allows us to pre define the shapes of specific arrays as we are pre compiling it ### call
- Here we are trying to find the gradients of the image
- This method is called gradient ascent. This adds the gradients found in every layer to the image and thus increases the activations at that point as well which is what we want
- GradientTape allows us to keep a sort of history of all the gradients and allows us to use it to calculate loss directly from the history
- After we get the gradients, we normalize them
- img = img + gradients * step_size is the main ascent function which maximizes the loss
- The clip value function here is used to scale all numbers to -1 or 1. Any values less than -1 is set to 1 and greater than 1 is set to 1. (You can say its another form of normalization)
class DeepDream(tf.Module): def __init__(self, model): self.model = model @tf.function(input_signature=( tf.TensorSpec(shape=[None, None, 3], dtype=tf.float32), tf.TensorSpec(shape=, dtype=tf.int32), tf.TensorSpec(shape=, dtype=tf.float32), )) def __call__(self, img, steps, step_size): print("Tracing") loss = tf.constant(0.0) for n in tf.range(steps): with tf.GradientTape() as tape: tape.watch(img) loss = calc_loss(img, self.model) gradients = tape.gradient(loss, img) gradients /= tf.math.reduce_std(gradients) + 1e-8 img = img + gradients * step_size img = tf.clip_by_value(img, -1, 1) return loss, img deepdream = DeepDream(dream_model)
I get the following image as an output.
See what has changed? Run this for more iterations and you'll see more changes and probably scarier outputs. Or you could choose a cheerful image. Up to you entirely.
If you are still interested to know more, check out the references and my explanations in the jupyter notebook in this repository.
This post is a simpler explanation and my understanding of the method described in the Google AI Research Blog: