{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Atelier Day 3\n",
"\n",
"# Introduction to deep learning with `keras` (and `tensorflow` backend)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"import tensorflow\n",
"from tensorflow import keras\n",
"\n",
"from tensorflow.keras import layers\n",
"from tensorflow.keras import models\n",
"\n",
"from tensorflow.keras.models import Sequential\n",
"from tensorflow.keras.layers import Dense, Dropout, Flatten\n",
"from tensorflow.keras.layers import Conv2D, MaxPooling2D\n",
"from tensorflow.keras import activations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 0. Introduction: about `tensorflow` and `keras`\n",
"\n",
"The `numpy` library does some expensive operations outside Python using efficient code (Fortran, C/C++). However, switching back to python after each operation cause a big overhead because of unnecessary copies of the data. \n",
"\n",
"The library `tensorflow` does all the computations outside of Python: the python API is used to define a graph of operations, that will run entirely using C++ binaries. This architecture allows to get rid of the overhead. Besides, knowing the computational graph beforehand allows to parallelize and/or distribute the computation more easily. As a result, `tensorflow` can run the computations on multiple CPUs or GPUs, and on multiple servers.\n",
"\n",
"However, for quick an easy model prototying, the library `keras` is simpler to use than `tensorflow`. \n",
"Deep learning models can be constructed thanks to `keras` in few lines of python. So in this notebook, we won't see direct calls to `tensorflow`, but only to `keras`, even if the computations are done by `tensorflow`.\n",
"\n",
"# 1. Handwritten digit recognition with MNIST\n",
"\n",
"For the first part of this tutorial, we will use the [MNIST](http://yann.lecun.com/exdb/mnist) dataset.\n",
"This dataset contains images representing handwritten digits. \n",
"Each image is 28 x 28 pixels, and each pixel is represented by a number (gray level). \n",
"These arrays can be flattened into vectors of 28 x 28 = 784 numbers.\n",
"You can then see each image as a point in a 784-dimensional vector space. \n",
"You can find interesting visualisations of this vector space [http://colah.github.io/posts/2014-10-Visualizing-MNIST/](http://colah.github.io/posts/2014-10-Visualizing-MNIST/).\n",
"\n",
"## 1.1. Introduction\n",
"\n",
"The labels in $\\{0, 1, 2, \\ldots, 9\\}$ giving the digit on the image are be represented using one-hot encoding: labels in $\\{0, 1, 2, \\ldots, 9\\}$ are replaced by labels in $\\{ 0, 1\\}^{10}$, namely $0$ is replaced by $(1, 0, \\ldots 0)$, $1$ is replaced by $(0, 1, 0, \\ldots 0)$, $2$ is replaced by $(0, 0, 1, 0, \\ldots, 0)$, etc.\n",
"\n",
"Also, MNIST data is grayscale pixels in $\\{0, \\ldots, 255\\}$. The pixels should be normalized to belong to $[0, 1]$.\n",
"Indeed, working with big floats can lead to important numerical errors, in particular in deep learning models."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1.2. Load the data\n",
"\n",
"MNIST is a very old and standard benchmark dataset for image classification, so it's built-in (ready to be downloaded) in all machine learning libraries (including `keras` and `tensorflow`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import os\n",
"\n",
"url = 'https://stephanegaiffas.github.io/files/formation_cnrs/MNIST.pickle.zip'\n",
"r = requests.get(url)\n",
"path_data = '../data/' \n",
"\n",
"with open(os.path.join(path_data, 'MNIST.pickle.zip'), 'wb') as f:\n",
" f.write(r.content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"### pour charger les données si le fichier 'MNIST.pickle' est zippé et est en local\n",
"import pickle as pkl\n",
"import os\n",
"import zipfile\n",
"\n",
"path_data = '../data/'\n",
"filename = 'MNIST.pickle.zip'\n",
"archive = zipfile.ZipFile(os.path.join(path_data, filename), 'r')\n",
"\n",
"with archive.open('MNIST.pickle') as f:\n",
" data = pkl.load(f)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"from tensorflow.keras import backend as K\n",
"\n",
"# Number of classes\n",
"num_labels = 10\n",
"# input image dimensions\n",
"img_rows, img_cols = 28, 28\n",
"\n",
"# the data, shuffled and split between train and test sets\n",
"# chargement sur le web_keras\n",
"#(x_train, y_train), (x_test, y_test) = mnist.load_data()\n",
"# chargement local\n",
"(x_train, y_train), (x_test, y_test) = data\n",
"\n",
"\n",
"if K.image_data_format() == 'channels_first':\n",
" x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)\n",
" x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)\n",
" input_shape = (1, img_rows, img_cols)\n",
"else:\n",
" x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)\n",
" x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)\n",
" input_shape = (img_rows, img_cols, 1)\n",
"\n",
"x_train = x_train.astype('float32')\n",
"x_test = x_test.astype('float32')\n",
"\n",
"print('x_train shape:', x_train.shape)\n",
"print('x_test shape:', x_test.shape)\n",
"print('y_train shape:', y_train.shape)\n",
"print('y_test shape:', y_test.shape)\n",
"\n",
"print(x_train.shape[0], 'train samples')\n",
"print(x_test.shape[0], 'test samples')\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"input_shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import seaborn as sns\n",
"\n",
"y_counts = pd.DataFrame({\n",
" 'data': np.array(['train'] * num_labels + ['test'] * num_labels),\n",
" 'class': np.tile(np.arange(num_labels), 2),\n",
" 'prop': np.hstack([np.bincount(y_train) / y_train.shape[0], \n",
" np.bincount(y_test) / y_test.shape[0]])\n",
"})\n",
"\n",
"fig, ax = plt.subplots(figsize=(8, 4))\n",
"sns.barplot(x='class', y='prop', hue='data', data=y_counts, ax=ax)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1.3. A first look at the data\n",
"\n",
"In the next cell we illustrate the first for elements of the training data: \n",
"pixels grayscale of the digit and their corresponding label."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"plt.figure(figsize=(8, 2))\n",
"for i in range(4):\n",
" plt.subplot(1, 4, i+1)\n",
" plt.imshow(x_train[i].reshape(28, 28), \n",
" interpolation=\"none\", cmap=\"gray_r\")\n",
" plt.title('Label=%d' % y_train[i], fontsize=14)\n",
" plt.axis(\"off\")\n",
"plt.tight_layout()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"n_rows = 4\n",
"n_cols = 8\n",
"plt.figure(figsize=(8, 4))\n",
"for i in range(n_rows * n_cols):\n",
" plt.subplot(n_rows, n_cols, i+1)\n",
" plt.imshow(x_train[i].reshape(28, 28),\n",
" interpolation=\"none\", cmap=\"gray_r\")\n",
" plt.axis(\"off\")\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first character is a 5 digit, encoded in grayscale matrix as follows"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(np.array2string(x_train[0].astype(np.int).reshape(28, 28), \n",
" max_line_width=150))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1.4. Normalization and preprocessing of the data\n",
"\n",
"We need to normalize the images and one-hot encode the labels.\n",
"\n",
"**Warning:** call this cell only once (otherwise you'll devide several times by 255), which might be problematic later on."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x_train /= x_train.max()\n",
"x_test /= x_test.max()\n",
"print(x_train.min(axis=None), x_train.max(axis=None))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# convert class vectors to binary class matrices\n",
"y_train = keras.utils.to_categorical(y_train, num_labels)\n",
"y_test = keras.utils.to_categorical(y_test, num_labels)\n",
"y_train[:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# 2. A first model: softmax (or multinomial logistic) regression \n",
"\n",
"Remember that each image $x$ is a $p\\times p = 28\\times 28\\times 1$ matrix ( $x=(x_{ij} )$ ) or a $pp = 784$ vector ( $x = (x_{j})$ ). \n",
"\n",
"We want to classify these pictures or equivalently to predict the digit $k$ varying in $\\{0, \\ldots, 9\\}$ they represent.\n",
"A simple model allowing to do that is softmax regression.\n",
"\n",
"## 2.1. Description of the model\n",
"\n",
"\n",
"The idea behind this model is to produce a score for each input image $x$ using a simple linear model. \n",
"To do so, we assume that belonging to a class $k$ (corresponding to digit $k$) can be expressed by a weigthed sum of the pixel intensities, with weights $W_{k, 1}, \\ldots, W_{k, 784}$ and to a bias $b_k$ capturing variability independent of the input:\n",
"$$\n",
"\\text{score}_k(x) = \\sum_{j=1}^{784} W_{k, j} x_j + b_k,\n",
"$$\n",
"These scores are sometimes called the \"logits\" in the deep learning community.\n",
"We then use the softmax function to convert the scores into predicted probabilities $p_k$:\n",
"$$\n",
"\\forall k =0,\\ldots,9,\\quad p_k(x) = \\text{softmax}(\\text{score}_k(x)) = \\frac{\\exp(\\text{score}_k(x))}{\\sum_{\\ell =0}^{9}\\exp(\\text{score}_{\\ell}(x))}.\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.2. The computational graph for training of softmax regression\n",
"\n",
"To train the model parameters (the bias $b_k$ and the weights $W_{k, j}$ where $k=0, \\ldots, 9$ and $j=1, \\ldots, 784$), the considered goodness-of-fit is the negative log-likelihood defined by the cross-entropy between the score $p_k(x)$ and the label $y$ :\n",
"$$\n",
"- \\sum_{k=0}^{9} y_{k} \\log(p_k(x))\n",
"$$\n",
"For this first model, we simply use stochastic gradient descent over small batches of data. It can be done easily with TensorFlow, as it (automatically and efficiently) computes the gradient from the graph, then apply an optimization algorithm of your choice to perform the parameters update."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# We use a sequential model: we stack layers\n",
"model_softmax = Sequential()\n",
"# First we need to flatten the data: replace 28 * 28 matrices by flat vectors\n",
"# This is always necessary before feeding data to a fully-connected layer (Dense object)\n",
"model_softmax.add(Flatten(input_shape=input_shape, name='flatten'))\n",
"# We add one dense (fully connected layer) with softmax activation function\n",
"# Since it's the first layer, we need to give the size of input data\n",
"model_softmax.add(Dense(num_labels, activation='softmax', name='output'))\n",
"\n",
"# We \"compile\" this model, \n",
"model_softmax.compile(\n",
" # specifying the loss as the cross-entropy\n",
" loss=keras.losses.categorical_crossentropy,\n",
" # We choose the Adagrad solver, but you can choose others\n",
" optimizer=keras.optimizers.Adagrad(),\n",
" # We will monitor the accuracy on a testing set along optimization\n",
" metrics=['accuracy']\n",
")\n",
"model_softmax.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.3. Run the training of the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"batch_size = 64\n",
"\n",
"# number of steps\n",
"epochs = 5\n",
"\n",
"# Run the train\n",
"history_softmax = model_softmax.fit(x_train, y_train,\n",
" batch_size=batch_size,\n",
" epochs=epochs,\n",
" verbose=1,\n",
" validation_data=(x_test, y_test))\n",
"score_softmax = model_softmax.evaluate(x_test, y_test, verbose=0)\n",
"print('Test loss:', score_softmax[0])\n",
"print('Test accuracy:', score_softmax[1])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def plot_history(history, title=''):\n",
" plt.figure(figsize=(7, 5))\n",
" plt.plot(history.epoch, history.history['accuracy'], lw=3, label='Training')\n",
" plt.plot(history.epoch, history.history['val_accuracy'], lw=3, label='Testing')\n",
" plt.legend(fontsize=14)\n",
" plt.title(title, fontsize=16)\n",
" plt.xlabel('Epoch', fontsize=14)\n",
" plt.ylabel('Accuracy', fontsize=14)\n",
" plt.tight_layout()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"plot_history(history_softmax, title='Accuracy of softmax regression')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**QUESTION**\n",
"\n",
"- Run 70 epochs and look at the training and testing accuracy curves"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.4. Visualisation of the model weights\n",
"\n",
"Weight matrices plots show that the learned weights are consistant with the digits they should predict (see below).\n",
"You should be able to see rough shapes corresponding to the digits 0, 1, 2, 3, etc."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"weights, biases = model_softmax.get_layer('output').get_weights()\n",
"imgs = weights.reshape(28, 28, 10)\n",
"\n",
"fig = plt.figure(figsize=(10, 5))\n",
"vmin, vmax = imgs.min(), imgs.max()\n",
"for i in range(10):\n",
" ax = plt.subplot(2, 5, i + 1)\n",
" im = imgs[:, :, i]\n",
" mappable = ax.imshow(im, interpolation=\"nearest\", \n",
" vmin=vmin, vmax=vmax, cmap='gray')\n",
" ax.axis('off')\n",
" ax.set_title(\"%i\" % i)\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.5. Prediction of the labels for new images"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# prediction de test avec Logistique\n",
"pred_log = model_softmax.predict(x_test);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.6. Saving the model\n",
"\n",
"In the next cell we save the model in a file, so that it can be used later on.\n",
"This is particularly helpful when the training of models is long: we can save it every once in a while, and \n",
"eventually continue to train it later on.\n",
"\n",
"**Warning:** You need to create a `\"models\"` folder in the folder containing this notebook"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"!mkdir models"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"models_path = 'models/'\n",
"model_softmax.save(os.path.join(models_path, 'mnist_softmax.h5'))\n",
"with open(os.path.join(models_path, 'mnist_softmax_history.pkl'), 'wb') as f:\n",
" pkl.dump(history_softmax.history, f)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!ls -al models"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.6. Conclusion with MNIST\n",
"\n",
"You should have reached an accuracy better than 0.9 with this simple model. \n",
"**Too easy !** You almost solved the problem using a simple softmax regression. \n",
"Weight matrices plots show that the learned weights are consistant with the digits they should predict (see below).\n",
"You should be able to see rough shapes corresponding to the digits 0, 1, 2, 3, etc.\n",
"\n",
"# 3. Feed-Forward Neural Network (FFNN)\n",
"\n",
"Now, let's build a better model for MNIST using more layers. \n",
"Let's start with a feed-forward neural net (FFNN) with one hidden layer and relu activation.\n",
"\n",
"## 3.1 Description\n",
"\n",
"The softmax regression is a linear model, with $(784+1)\\times 10 = 7850$ parameters. \n",
"It is easy to fit, numerically stable, but might be too simple for some tasks. \n",
"The aim of neural networks is to consider nonlinear models, while keeping the nice features of linear ones. \n",
"The idea is to keep parameters into linear functions, and link these small linear model using non linear operations.\n",
"\n",
"A simple nonlinearity which is often used to do this is the **Rectified Linear Unit**: $\\quad \\text{ReLU}(x) = \\max(0, x)$\n",
"\n",
"The derivative of this function is very easy to compute, and it is parameter-free. If we stack models such as softmax regression and ReLUs, it is still very easy to compute the gradient using the chain rule, as the model is a combination of simple functions.\n",
"\n",
"The backpropagation algorithm allows efficient computation of complex derivatives as long as the function is made of simple blocks with simple derivatives. \n",
"This algorithm efficiency is based on data reuse: when working with parallel architectures such as GPUs, you want to minimize communication (data transfer) as it is very time consuming in comparison to the computing time.\n",
"\n",
"## 3.2. Computational graph for a single hidden layer FFNN\n",
"\n",
"We create the graph for a fully connected feed-forward neural network with one hidden layer with 128 units and a relu activation function. We use what you did for softmax regression : we just need to add a **single line** to the code creating the softmax regression."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# Define the model\n",
"model_ffnn = Sequential()\n",
"\n",
"model_ffnn.add(Flatten(input_shape=input_shape, name='flatten'))\n",
"## The new next line adds the extra layer\n",
"model_ffnn.add(Dense(128, activation='relu', name='dense'))\n",
"model_ffnn.add(Dense(num_labels, activation='softmax', name='output'))\n",
"\n",
"model_ffnn.compile(\n",
" loss=keras.losses.categorical_crossentropy,\n",
" optimizer=keras.optimizers.Adagrad(),\n",
" metrics=['accuracy']\n",
")\n",
"\n",
"model_ffnn.summary()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"# Run the model\n",
"batch_size = 32\n",
"epochs = 5\n",
"\n",
"# Run the train\n",
"history_ffnn = model_ffnn.fit(x_train, y_train,\n",
" batch_size=batch_size,\n",
" epochs=epochs,\n",
" verbose=1,\n",
" validation_data=(x_test, y_test))\n",
"score_ffnn = model_ffnn.evaluate(x_test, y_test, verbose=0)\n",
"print('Test loss:', score_ffnn[0])\n",
"print('Test accuracy:', score_ffnn[1])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"plot_history(history_ffnn, title='Accuracy of one hidden layer feed forward')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_ffnn.save(os.path.join(models_path, 'mnist_ffnn.h5'))\n",
"with open(os.path.join(models_path, 'mnist_ffnn_history.pkl'), 'wb') as f:\n",
" pkl.dump(history_ffnn.history, f)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3.4 Your job\n",
"\n",
"Run 60 epochs and look at the training and testing accuracy curves."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# 4. Convolutional Neural Network (CNN)\n",
"\n",
"In practice, increasing the size of hidden layers is not very effective. \n",
"It is often a better idea to add more layers. \n",
"Intuitively, if the phenomenon you try to learn has a hierarchical structure, adding more layers can be interpreted as a way to learn more levels of abstraction. \n",
"For example, if you are trying to recognize objects, it is easier to express shapes from edges and objects from shapes, than to express objects from pixels. \n",
"Thus, a good design should try to exploit this hierarchy.\n",
"\n",
"In particular cases, such as grid-like data (time series, images), you might want to detect a pattern which can happen in different locations of the data. \n",
"For example, you try to detect a cat, but the cat can be in the middle or the left of the picture. \n",
"Thus you need to build a model which is translation invariant: it is easier to learn how to recognize an object independently of its location. \n",
"\n",
"## 4.1 Description\n",
"\n",
"When two inputs might contain the same kind of information, then it is useful to share their weights and train the weights jointly for those inputs to learn statistical invariants (things that don't change much on average across time or space). \n",
"Using this concept on images leads to convolutional neural networks (CNNs), on text, it results on recurrent neural networks (RNNs). \n",
"When using CNNs, you set weights to a small kernel that will be used to perform a convolution across the image.\n",
"\n",
"The image is represented as a 3-dimensional tensor: (width, height, depth). Width and height charecterize the size of the image (eg. 28 x 28 pixels), and depth the color space (e.g. 1 for grey levels, 3 for RGB pictures since each pixel is represented by a triplet $(R,G,B)$).\n",
"\n",
"The convolution will map patches of this image, combined with the convolution kernel, for example\n",
"\n",
"$$\n",
"\\text{output} = \\text{ReLU}(\\text{patch} \\times W + b)\n",
"$$\n",
"\n",
"Depending on the shape of the $W$ weights tensor, the tensor resulting from the convolution can have a different depth. Note that in the context of a CNN, the \"kernel\" can be also called a \"filter\".\n",
"\n",
"Performing the convolution between the image and the kernel consist to move the kernel across the image, and to produce an output for each patch. \n",
"The way you move across the image is defined by two parameters:\n",
"\n",
"- **Stride:** the stride is the number of pixels you are shifting each time you move your kernel during the convolution.\n",
"- **Padding:** defines what happens when the kernel reaches a border of the image when doing the convolution. \n",
"\"Valid\" padding means that you stop at the edge, while \"Same\" padding allows to go off the edge and pad with zeros so that the width and the height of the output and input tensors are the same.\n",
"\n",
"For example, a convolution with a stride $> 1$ and valid padding results in a tensor of smaller width and height. \n",
"You can compute the size of a tensor after convolution using the following formulas:\n",
"\n",
"#### Valid padding\n",
"$$\n",
"\\text{out}_{\\text{height}} = \\bigg\\lceil \\frac{\\text{in}_{\\text{height}} - \\text{kernel}_{\\text{height}} + 1}{\\text{stride}_{\\text{vertical}}} \\bigg\\rceil \\quad \\text{ and } \\quad\n",
"\\text{out}_{\\text{width}} = \\bigg\\lceil \\frac{\\text{in}_{\\text{width}} - \\text{kernel}_{\\text{width}} + 1}{\\text{stride}_{\\text{horizontal}}} \\bigg\\rceil\n",
"$$\n",
"\n",
"#### Same padding\n",
"$$\n",
"\\text{out}_{\\text{height}} = \\bigg\\lceil \\frac{\\text{in}_{\\text{height}}}{\\text{strides}_{\\text{vertical}}} \n",
"\\bigg\\rceil \\quad \\text{ and } \\quad \n",
"\\text{out}_{\\text{width}} = \\bigg\\lceil \\frac{\\text{in}_{\\text{width}}}{\\text{strides}_{\\text{horizontal}}} \\bigg\\rceil\n",
"$$\n",
"\n",
"**Example.**\n",
"Assume the input tensor is 28x28x3 and the convolution kernel takes in 4x4x3 tensors and outputs 1x1x32 tensors (height x width x depth), i.e the kernel takes in a patch of size 4x4 and depth 3, and output a patch of size 1x1 and depth 32. To do so, the weights tensor $W$ should be 3x3x3x32 (in-height, in-width, in-depth, out-depth). \n",
"If we are using a stride of 1, the output tensor will be 28x28x32 with same padding, and 25x25x32 with valid padding.\n",
"Using a stride of 2, the output tensor will be 14x14x32 with same padding, and 13x13x32 with valid padding.\n",
"\n",
"Striding is an agressive method to reduce the image size. \n",
"Instead, it can be a better idea to use a stride of 1 and to combine the convolution's outputs being in some neighborhood. Such an operation combining elements of a tensor is called **pooling**. \n",
"Neighborhoods are define by the pooling window dimension (width x height) and the strides you use when moving this window across the image.\n",
"\n",
"**Example.**\n",
"Max pooling aggregate several outputs in a neighborhood $N$ using a max operation: \n",
"\n",
"$$\n",
"\\text{output}'_i = \\max_{j \\in N}\\text{output}_j, \\quad i \\in N.\n",
"$$\n",
"The formulas to compute the size of the ouput tensor are the same as for convolution padding and striding.\n",
"\n",
"Many successful architectures stack convolution layers in a \"pyramidal\" way: each convolution layer result in a tensor with increased depth and decreased height and width. \n",
"Roughly, increasing the depth increases the complexity of the semantic compexity of your representation, and allows to keep the relevant information in a smaller space (height x width). \n",
"\n",
"## 4.2. Computational graph \n",
"\n",
"We implement a CNN having the following structure:\n",
"\n",
"- Convolutional layer with 32 filters and 3 * 3 kernel sizes and 'relu' activation (use the `Conv2D` object)\n",
"- Convolutional layer with 64 filters and 3 * 3 kernel sizes and 'relu' activation (use the `Conv2D` object)\n",
"- Max pooling with pool size 2 * 2 (use the `MaxPooling2D` object)\n",
"- Dropout with probability 0.25 (use the `Dropout` object)\n",
"- Dense layer with 128 units with relu activation\n",
"- Dropout with probability 0.5\n",
"- Dense output layer with softmax activation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"model_cnn = Sequential()\n",
"model_cnn.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape, name='conv2d_1'))\n",
"model_cnn.add(Conv2D(64, kernel_size=(3, 3), activation='relu', name='conv2d_2'))\n",
"model_cnn.add(MaxPooling2D(pool_size=(2, 2), name='max_pool_1'))\n",
"model_cnn.add(Dropout(0.25, name='dropout_1'))\n",
"model_cnn.add(Flatten(name='flatten'))\n",
"model_cnn.add(Dense(128, activation='relu', name='dense'))\n",
"model_cnn.add(Dropout(0.5, name='dropout_2'))\n",
"model_cnn.add(Dense(num_labels, activation='softmax', name='output'))\n",
" \n",
"model_cnn.compile(loss=keras.losses.categorical_crossentropy,\n",
" optimizer=keras.optimizers.Adadelta(),\n",
" metrics=['accuracy'])\n",
"\n",
"model_cnn.summary()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(x_train.shape)\n",
"print(input_shape)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"#run the model\n",
"batch_size = 32\n",
"epochs = 1\n",
"\n",
"# Run the train\n",
"history_cnn = model_cnn.fit(x_train, y_train,\n",
" batch_size=batch_size,\n",
" epochs=epochs,\n",
" verbose=1,\n",
" validation_data=(x_test, y_test))\n",
"score_cnn = model_cnn.evaluate(x_test, y_test, verbose=0)\n",
"print('Test loss:', score_cnn[0])\n",
"print('Test accuracy:', score_cnn[1])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"plot_history(history_cnn, title='Accuracy of CNN')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# save the model and its history using the following cell to continue to train it later\n",
"model_cnn.save(os.path.join(models_path, 'mnist_cnn.h5'))\n",
"with open(os.path.join(models_path, 'mnist_cnn_history.pkl'), 'wb') as f:\n",
" pkl.dump(history_cnn.history, f)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 5. MNIST is too easy: let's classify weird letters now (notMNIST)\n",
"\n",
"MNIST is a very very **clean** dataset. Digits are rescaled, smoothed, centered, and pixel values are normalized beforehand. Let's switch to a slightly harder dataset: [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html).\n",
"\n",
"This time, labels are letters from 'A' to 'J' (10 classes). \n",
"These letters are taken from digital fonts instead of handwriting pictures. \n",
"We will use a reduced amount of data to ensure a reasonable training time. \n",
"The training set you will use has 200K labelled examples, while the validation and test sets both contain 10K labelled examples.\n",
"\n",
"**Note** : The notMNIST data that we'll load is already normalized in [-0.5, 0.5] with one-hot encoded labels\n",
"\n",
"## 5.1. Load the notMNIST dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import os\n",
"\n",
"url = 'https://stephanegaiffas.github.io/files/formation_cnrs/notMNIST_100.pkl.gz'\n",
"r = requests.get(url)\n",
"path_data = '../data/' \n",
"\n",
"with open(os.path.join(path_data, 'notMNIST_100.pkl.gz'), 'wb') as f:\n",
" f.write(r.content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import gzip\n",
"\n",
"filename = 'notMNIST_100.pkl.gz'\n",
"with gzip.open(os.path.join(path_data, 'notMNIST_100.pkl.gz')) as f:\n",
" data = pkl.load(f)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"data.keys()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"from tensorflow.keras import backend as K\n",
"\n",
"def reshape(x, image_data_format, img_rows, img_cols):\n",
" if image_data_format == 'channels_first':\n",
" return x.astype(np.float32).reshape((-1, 1, img_rows, img_cols))\n",
" else:\n",
" return x.astype(np.float32).reshape((-1, img_rows, img_cols, 1))\n",
"\n",
"img_rows, img_cols = 28, 28\n",
"num_labels = 10\n",
"image_data_format = K.image_data_format()\n",
"\n",
"if image_data_format == 'channels_first':\n",
" input_shape = (1, img_rows, img_cols)\n",
"else:\n",
" input_shape = (img_rows, img_cols, 1)\n",
" \n",
"x_train = reshape(data['train_dataset'], image_data_format, img_rows, img_cols)\n",
"x_valid = reshape(data['valid_dataset'], image_data_format, img_rows, img_cols)\n",
"x_test = reshape(data['test_dataset'], image_data_format, img_rows, img_cols)\n",
"\n",
"y_train = keras.utils.to_categorical(data['train_labels'])\n",
"y_valid = keras.utils.to_categorical(data['valid_labels'])\n",
"y_test = keras.utils.to_categorical(data['test_labels'])\n",
"\n",
"print('x_train shape:', x_train.shape)\n",
"print('x_valid shape:', x_valid.shape)\n",
"print('x_test shape:', x_test.shape)\n",
"print('y_train shape:', y_train.shape)\n",
"print('y_valid shape:', y_valid.shape)\n",
"print('y_test shape:', y_test.shape)\n",
"\n",
"print(x_train.shape[0], 'training samples')\n",
"print(x_valid.shape[0], 'validation samples')\n",
"print(x_test.shape[0], 'testing samples')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"# plt.figure(figsize=(8, 4))\n",
"n_rows = 10\n",
"n_cols = 8\n",
"plt.figure(figsize=(n_cols, n_rows))\n",
"\n",
"letters = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']\n",
"def get_label(y):\n",
" return letters[y.argmax()]\n",
"\n",
"for i in range(n_rows * n_cols):\n",
" ax = plt.subplot(n_rows, n_cols, i+1)\n",
" ax.imshow(x_train[i].reshape(28, 28),\n",
" interpolation=\"none\", cmap=\"gray_r\")\n",
" ax.set_title(get_label(y_train[i]), fontsize=14)\n",
" ax.axis(\"off\")\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5.2 Training a softmax model on notMNIST"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Your job**\n",
"\n",
"- train a softmax regression : start with a small number of epochs, and increase the number of epochs later\n",
"- visualize the weight\n",
"- plot the convergence curves\n",
"- save the model and its history"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5.3. Training a one-layer FFNN on notMNIST\n",
"\n",
"**Your job**\n",
"\n",
"- Train FFNN with one hidden layer with 128 units\n",
"- visualize the weight\n",
"- plot the convergence curves\n",
"- save the model and its history"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5.4 Training a deeper CNN for notMNIST\n",
"\n",
"\n",
"**Your job**\n",
"\n",
"\n",
"Train a CNN with the following structure:\n",
"\n",
"- Convolutional layer with 32 filters and 5 * 5 kernel sizes and 'relu' activation\n",
"- Max pooling with pool size 2 * 2\n",
"- Convolutional layer with 64 filters and 5 * 5 kernel sizes and 'relu' activation\n",
"- Max pooling with pool size 2 * 2\n",
"- Dropout with probability 0.25\n",
"- Dense layer with 1024 units\n",
"- Dropout with probability 0.5\n",
"- Dense output layer with softmax activation\n",
"\n",
"Use the Adam solver. Train for 20 epochs or more (this might take a loooong) time.\n",
"\n",
"You should achieve >= 97% accuracy on test set\n",
"\n",
"- Save the model and visualize the last fully connected layers"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# We use a sequential model: we stack layers\n",
"model_softmax = Sequential()\n",
"# First we need to flatten the data: replace 28 * 28 matrices by flat vectors\n",
"# This is always necessary before feeding data to a fully-connected layer (Dense object)\n",
"model_softmax.add(Flatten(input_shape=input_shape, name='flatten'))\n",
"# We add one dense (fully connected layer) with softmax activation function\n",
"# Since it's the first layer, we need to give the size of input data\n",
"model_softmax.add(Dense(num_labels, activation='softmax', name='output'))\n",
"\n",
"# We \"compile\" this model, \n",
"model_softmax.compile(\n",
" # specifying the loss as the cross-entropy\n",
" loss=keras.losses.categorical_crossentropy,\n",
" # We choose the Adagrad solver, but you can choose others\n",
" optimizer=keras.optimizers.Adagrad(),\n",
" # We will monitor the accuracy on a testing set along optimization\n",
" metrics=['accuracy']\n",
")\n",
"model_softmax.summary()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"batch_size = 64\n",
"\n",
"# number of steps\n",
"epochs = 2\n",
"\n",
"# Run the train\n",
"history_softmax = model_softmax.fit(x_train, y_train,\n",
" batch_size=batch_size,\n",
" epochs=epochs,\n",
" verbose=1,\n",
" validation_data=(x_test, y_test))\n",
"score_softmax = model_softmax.evaluate(x_test, y_test, verbose=0)\n",
"print('Test loss:', score_softmax[0])\n",
"print('Test accuracy:', score_softmax[1])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"weights, biases = model_softmax.get_layer('output').get_weights()\n",
"imgs = weights.reshape(28, 28, 10)\n",
"\n",
"fig = plt.figure(figsize=(10, 5))\n",
"vmin, vmax = imgs.min(), imgs.max()\n",
"for i in range(10):\n",
" ax = plt.subplot(2, 5, i + 1)\n",
" im = imgs[:, :, i]\n",
" mappable = ax.imshow(im, interpolation=\"nearest\", \n",
" vmin=vmin, vmax=vmax, cmap='gray')\n",
" ax.axis('off')\n",
" ax.set_title(\"%i\" % i)\n",
"plt.tight_layout()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_cnn = Sequential()\n",
"model_cnn.add(Conv2D(32, kernel_size=(5,5), activation='relu', input_shape=input_shape, name='conv2d_1'))\n",
"model_cnn.add(MaxPooling2D(pool_size=(2, 2), name='max_pool_1'))\n",
"model_cnn.add(Conv2D(64, kernel_size=(5, 5), activation='relu', name='conv2d_2'))\n",
"model_cnn.add(MaxPooling2D(pool_size=(2, 2), name='max_pool_2'))\n",
"model_cnn.add(Dropout(0.25, name='dropout_1'))\n",
"model_cnn.add(Flatten(name='flatten'))\n",
"model_cnn.add(Dense(1024, activation='relu', name='dense'))\n",
"model_cnn.add(Dropout(0.5, name='dropout_2'))\n",
"model_cnn.add(Dense(num_labels, activation='softmax', name='output'))\n",
" \n",
"model_cnn.compile(loss=keras.losses.categorical_crossentropy,\n",
" optimizer=keras.optimizers.Adadelta(),\n",
" metrics=['accuracy'])\n",
"\n",
"model_cnn.summary()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#run the model\n",
"batch_size = 64\n",
"epochs = 2\n",
"\n",
"# Run the train\n",
"history_cnn = model_cnn.fit(x_train, y_train,\n",
" batch_size=batch_size,\n",
" epochs=epochs,\n",
" verbose=1,\n",
" validation_data=(x_test, y_test))\n",
"score_cnn = model_cnn.evaluate(x_test, y_test, verbose=0)\n",
"print('Test loss:', score_cnn[0])\n",
"print('Test accuracy:', score_cnn[1])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# prediction de test avec CNN\n",
"pred_cnn = model_cnn.predict(x_test);\n",
"# prediction de test avec Logistic\n",
"pred_log = model_softmax.predict(x_test);"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Quelques prédictions avec CNN\")\n",
"\n",
"# plt.figure(figsize=(8, 4))\n",
"n_rows = 5\n",
"n_cols = 10\n",
"plt.figure(figsize=(n_cols, n_rows))\n",
"\n",
"letters = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']\n",
"def get_label(y):\n",
" return letters[y.argmax()]\n",
"\n",
"for i in range(n_rows * n_cols):\n",
" ax = plt.subplot(n_rows, n_cols, i+1)\n",
" ax.imshow(x_test[i].reshape(28, 28),\n",
" interpolation=\"none\", cmap=\"gray_r\")\n",
" #ax.set_title(get_label(y_test[i]), fontsize=14)\n",
" ax.set_title(get_label(pred_cnn[i]), fontsize=14)\n",
" #ax.set_title(get_label(pred_log[i]), fontsize=14)\n",
" ax.axis(\"off\")\n",
"plt.tight_layout()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Quelques prédictions avec Logistique\")\n",
"# plt.figure(figsize=(8, 4))\n",
"n_rows = 5\n",
"n_cols = 10\n",
"plt.figure(figsize=(n_cols, n_rows))\n",
"\n",
"letters = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']\n",
"def get_label(y):\n",
" return letters[y.argmax()]\n",
"\n",
"for i in range(n_rows * n_cols):\n",
" ax = plt.subplot(n_rows, n_cols, i+1)\n",
" ax.imshow(x_test[i].reshape(28, 28),\n",
" interpolation=\"none\", cmap=\"gray_r\")\n",
" #ax.set_title(get_label(y_test[i]), fontsize=14)\n",
" #ax.set_title(get_label(pred_cnn[i]), fontsize=14)\n",
" ax.set_title(get_label(pred_log[i]), fontsize=14)\n",
" ax.axis(\"off\")\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 6. A fashionable use case : Clothing Classification with fashion-mnist\n",
"\n",
"- create a new notebook\n",
"- load data with `fashion_mnist.load_data()` - no validation sets, only train and test sets. \n",
"- labels names are :\n",
"\n",
"`LABEL_NAMES = ['t_shirt', 'pantalon', 'pull', 'robe', 'manteau', 'sandale', 'chemise', 'baskets', 'sac', 'bottes']`\n",
"\n",
"- copy *preliminaries* [1]\n",
"- copy and adapt *load* [2] : what are the shapes ? the labels distributions ? what does the data look like ?\n",
"- normalize the images [9] and one-hot encode the labels [10]\n",
"- create a model with 3 layers CONV+POOL+DROP - take inspiration from the [23] and add a layer. \n",
"- run the model as [25] does\n",
"- plot the convergence curves as [26] does\n",
"- make prediction as [15] does\n",
"- study the errors : what kind of clothes are difficult to classify ?\n",
"\n",
"\n",
"***Take care with the kernel sizes***\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By instance\n",
"\n",
" - Convolutional layer with 64 filters and 5 * 5 kernel sizes and 'relu' activation\n",
" - Max pooling with pool size 2 * 2\n",
" - Dropout with probability 0.25\n",
"\n",
" - Convolutional layer with 128 filters and 5 * 5 kernel sizes and 'relu' activation\n",
" - Max pooling with pool size 2 * 2\n",
" - Dropout with probability 0.25\n",
"\n",
" - Convolutional layer with 256 filters and 3 * 3 kernel sizes and 'relu' activation\n",
" - Max pooling with pool size 2 * 2\n",
" - Dropout with probability 0.25\n",
"\n",
" - Dense layer with 256 units\n",
" - Dropout with probability 0.5\n",
" - Dense output layer with softmax activation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import os\n",
"\n",
"url = 'https://stephanegaiffas.github.io/files/formation_cnrs/fashionMNIST.pickle.zip'\n",
"r = requests.get(url)\n",
"path_data = '../data/' \n",
"\n",
"with open(os.path.join(path_data, 'fashionMNIST.pickle.zip'), 'wb') as f:\n",
" f.write(r.content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import zipfile\n",
"\n",
"filename = 'fashionMNIST.pickle.zip'\n",
"archive = zipfile.ZipFile(os.path.join(path_data, filename), 'r')\n",
"with archive.open('fashionMNIST.pickle') as f:\n",
" data = pkl.load(f)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}