CS360 Lab 8: Neural Networks

Due: Sunday, November 17 Monday, November 18 at 11:59pm

Overview

The goals of this week’s lab:

Practice using high-level libraries to create neural network architectures
Become comfortable using documentation (i.e. tensorflow and keras)
Compare different neural network approaches for the task of image classification
See how neural network theory is implemented in practice

Note that there is a check in for this lab. By lab on November 14 you should have completed at least the data pre-processing. Ideally you should have also started the fully connected neural network.

Acknowledgements: adapted from Stanford CS231n course materials

Introduction

To get started, find your git repo for this lab assignment:

$ cd cs360/labs08/

You will have or create the following files:

run_nn_tf.py - your main program executable training and testing NNs
fc_nn.py - implementation of a fully connected neural network
cnn.py - implementation of a convolutional neural network
run_best.py - train and test your best neural network for the competition portion
README.md - for analysis questions, results, and data collection

For this lab, we will be investigating the CIFAR-10 (“see-far”) dataset, which contains small images from 10 classes (examples shown below):

See Learning Multiple Layers of Features from Tiny Images by Alex Krizhevsky (2009) for more information about this dataset.

To use tensorflow, you’ll need to put these lines at the end of your .bashrc file (which should be in your home directory):

export PATH=/packages/cs/python3.7.3/bin:/usr/local/cuda-10.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64:/usr/local/cuda/extras/CUPTI/lib64:/packages/cs/python3.7.3/lib

We have several machines with GPUs. I will post this list on Piazza (so it is not public), but you do not need to do anything special to use the GPU once you have logged in to one of these machines. If you want to make sure your code is making use of the GPU, you can use:

tf.debugging.set_log_device_placement(True)

Similar to Lab 8, we will be using high-level libraries, so consult the documentation frequently. Make sure you really understand each line of code you’re writing and each method you’re using - there are fewer lines of code for this lab, but each line is doing a lot!

When importing modules, avoid importing with the * - this imports everything in the library (and these functions can be confused with user-defined functions). Instead, import functions and classes directly. Imports should be sorted and grouped by type (for example, usually I group imports from python libraries vs. my own libraries).

For help with tensorflow, see the TensorFlow Documentation. Note that your final code will take a while to run, so please budget time for experimenting with different architectures.

Part 1: Data Pre-processing

1. First, examine the starter code in run_nn_tf.py. There is code for reading in the cifar-10 dataset and dividing it into train data, validation data, or test data. This allows us to iterate over the data in “mini-batches”. In gradient descent, often we use only one data point at a time to update the model parameters. Here we will use mini-batches so that gradient updates are not performed quite so frequently (but also not as infrequently as if we had used the entire “batch”). Mini-batches are a middle ground between these two approaches.

You don’t need to worry about combine_batches too much - this function reads the data in from a folder and combines data from different files. The load_cifar10 function allows you to choose the number of training, validation, and testing examples. Here, the input data X has 4 dimensions:

shape of X: (n, 32, 32, 3)

where n is the number of examples. Each example is an image of size 32 x 32 pixels. Since we have RGB (red, blue, green) values for each pixel, the depth of the data is 3. We will also refer to 3 as the number of channels.

The first coding step is to modify the load_cifar10 function to subtract off the mean and divide by the standard deviation (normalize the data). This will make our data zero-centered and keep all the features roughly on the same scale (see image below).

Figure from Stanford CS231n course, section “Neural Networks Part 2: Setting up the Data and the Loss”

We will compute the mean and std on the training data only, since we need to make sure to treat our validation and test data the same way we treated our training data (think back to the continuous feature transformation we did for Lab 2). To perform these operations on a matrix, you can use:

mean_pixel = dset.mean(axis=(0, 1, 2), keepdims=True)
std_pixel = dset.std(axis=(0, 1, 2), keepdims=True)

Where dset should now be the training data. We will compute the mean across the first 3 axes, but allow the RGB channels to have different means. keepdims=True allows the result to be broadcast into the shape of the original data. So after computing these matrices, subtract off the mean matrix and divide by the standard deviation matrix.

To test this part, run

python3 run_nn_tf.py

And make sure you get the dimensions for train, validation, and test as commented in the starter code.

Now in main, set up train_dset, val_dset, and test_dset using the tf.data.Dataset.from_tensor_slices method. Here we will use a mini-batch size of 64. We will shuffle the train data, but not the validation and test data. To test this part in main, loop over the train data:

for images, labels in train_dset:

and print out the shape of images and labels. Do these values make sense?

We are now ready to begin using the data to train neural networks.

Part 2: Fully Connected Neural Network

In fc_nn.py we will implement a fully connected neural network with two layers:

Layer one: connect the input nodes to hidden nodes Layer two: connect hidden nodes to the output

Ultimately we will create an FCmodel object. Investigate the documentation for the following types of layers:

tensorflow.keras.layers.Flatten
tensorflow.keras.layers.Dense

In the constructor, we first want to use Flatten to “unravel” an image into a vector of pixels. We want to do this for all images at once, but we don’t want to unravel them all together. So if we have 100 images, each of shape (32,32,3), we want to return a tensor of shape (100,32 * 32 * 3).

Then add two fully connected layers (dense). The first one should have 4000 hidden units and use ReLU activation. The second one should have the number of units matching the number of classes, and use softmax (multi-class logistic regression) as the activation function.

After creating these layers in the constructor, we need to write a call method that will take an input x (what is the shape of x?), apply the function, and produce the outputs. Each member variable from the constructor is actually a function, so we can apply them to x one at a time until we get to the output (then return this output).

To test this part, comment out one of the types of input x_np and run fc_nn.py. What output is produced? Does this output make sense? Switch to the other type of x_np - what does the output look like now?

python3 fc_nn.py

Now we are ready to train a fully connected neural network. Investigate and read through the run_training function in run_nn_tf.py. We can use this function to train any type of model using gradient descent. Work through the TODOs in the starter code to complete this function and the associated helper functions.

(Hint: look up documentation for tf.keras.losses and tf.keras.optimizers.)

After each epoch we will check the accuracy on the validation dataset. We will not actually perform hyper-parameter turning here (although you are welcome to do that later), but the idea is that we will not use the testing data until the very end. We use this small validation set to check convergence.

Now in main in run_nn_tf.py, call run_training and pass in the appropriate model and parameters for a 2-layer fully connected network. Then run the code:

python3 run_nn_tf.py

To train the fully connected network on the training data. After a few epochs (pass through all the training data), you should be able to get accuracies above 40%. We will return to the fully connected network later for comparison, but for now we turn to a different architecture.

Part 3: Convolutional Neural Network

In cnn.py, we will set up the CNN architecture in the class CNNmodel. Our architecture will have two convolutional layers and one fully connected layer. Here is a summary:

A convolutional layer (with bias) with 32 filters, each with shape 5 x 5
ReLU nonlinearity
A convolutional layer (with bias) with 16 filters, each with shape 3 x 3
ReLU nonlinearity
Fully-connected layer with bias, producing ~~scores~~ probabilities for each class

See the documentation for:

tensorflow.keras.layers.Conv2D

Run the function three_layer_convnet_test to confirm the dimensions of your model. Do the results make sense?

python3 cnn.py

Finally, in run_nn_tf.py, train a 3-layer neural network using the run_training function with the appropriate input functions and parameters. You can use the same learning rate. After a few epochs you should be able to get over 50% accuracy.

Part 4: Comparison and Analysis

Training Curve

One common way to see how training is progressing is to plot accuracy against the training iteration (usually measured in terms of number of ~~mini-batches~~ epochs). Increase the number of epochs to 10, and then create a plot of both training accuracy and validation accuracy as a function of the training iteration, for both the fully connected NN and the CNN. Here is how the axes should be set up (note you should actually have 4 curves though)!

Include this image in your submission (specify which file in your README), along with some brief analysis. Which method did better overall? Do you consider the networks to have converged after 10 epochs? If yes, at what point did they converge?

Edit: you can instead make two plots, one for FC and one for CNN (each with two curves, one for train and one for validation).

Confusion Matrix

Devise a way to use your results to create a confusion matrix for the test data (which we so far have not used at all). Include these as either tables or images in your README (one matrix for each method). If you want to copy your results onto a visualization, you are welcome to download the figure below:

Include brief analysis about which classes seemed the most difficult.

Best Network (open-ended)

Edit: this part is now optional, and the deadline (just for this part) is extended to Sunday, December 1.

In run_best.py, modify either your fully connected network or your CNN to increase the accuracy on the validation data as much as possible. Suggestions to try:

change the optimizer
change the activation function
change the number of layers
change the number of hidden units in each layer
change the number of filters and/or the filter size
use pooling for the CNN network
include regularization

By the end of this part, you should have improved over the networks in the previous sections. Report your testing accuracy and testing confusion matrix in your README. You should be able to get at least 60-65% accuracy on the test data. Briefly discuss what you tried and what had the biggest impact in your README. Make sure to use the machines with GPUs for faster training!

There will be a small prize for the highest testing accuracy! Under the following conditions: you must create the network yourself, and I must be able to run it.