diff --git a/conv-nn-bigmax-2019/bigmax_mnist_example_tutorial.html b/conv-nn-bigmax-2019/bigmax_mnist_example_tutorial.html index 115909e66914461da944f9a56e8f143e9a34d262..a35ff386c0fe872153007efa242507ef80b2db74 100644 --- a/conv-nn-bigmax-2019/bigmax_mnist_example_tutorial.html +++ b/conv-nn-bigmax-2019/bigmax_mnist_example_tutorial.html @@ -11775,7 +11775,7 @@ div#notebook {
-

# (Convolutional) Neural network tutorial - BigMax workshop - Dresden, April 2019¶

##### Authors: Angelo Ziletti, Andreas Leitherer, and Luca M. Ghiringhelli - Fritz Haber Institute of the Max Planck Society, Berlin¶

In this tutorial, we briefly introduce the main ideas behind convolutional neural networks, build a neural network model, and finally explain the classification decision process using attentive response maps.

+

# (Convolutional) Neural network tutorial - BigMax workshop - Dresden, April 2019¶

##### Authors: Angelo Ziletti, Andreas Leitherer, and Luca M. Ghiringhelli - Fritz Haber Institute of the Max Planck Society, Berlin¶

In this tutorial, we briefly introduce the main ideas behind convolutional neural networks, build a neural network model with Keras, and explain the classification decision process using attentive response maps.

@@ -11991,10 +11991,10 @@ Convolutional networks have been tremendously successful in practical applicatio

The name "convolutional neural network" indicates that the network employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation.

A typical layer of a convolutional network consists of three stages:

-
1. Convolution stage: the layer performs several convolutions in parallel to produce a set of linear activations.

+
2. Convolution stage: the layer performs several convolutions in parallel to produce a set of linear activations (see Sec. 3 for more details).

3. Detector stage: each linear activation is run through a nonlinear activation function (e.g. rectified linear -activation function)

+activation function, sigmoid or tanh function)

4. Pooling stage: a pooling function is used to modify (downsample) the output of the layer. A pooling function replaces the output of the network at a certain location with a summary statistic of the nearby outputs. For example, the max pooling operation reports the maximum output within a rectangular neighborhood. Other popular pooling functions include the average of a rectangular neighborhood, the $L^2$ norm of a rectangular neighborhood, or a weighted average based on the distance from the central pixel.

5. diff --git a/conv-nn-bigmax-2019/bigmax_mnist_example_tutorial.ipynb b/conv-nn-bigmax-2019/bigmax_mnist_example_tutorial.ipynb index 0552178c40ebd67c740a0295119419dd2b62d637..f6cb89a8f5e9dacad7e97404e3b014093839f3a9 100644 --- a/conv-nn-bigmax-2019/bigmax_mnist_example_tutorial.ipynb +++ b/conv-nn-bigmax-2019/bigmax_mnist_example_tutorial.ipynb @@ -9,7 +9,7 @@ "\n", "##### Authors: Angelo Ziletti, Andreas Leitherer, and Luca M. Ghiringhelli - Fritz Haber Institute of the Max Planck Society, Berlin\n", "\n", - "In this tutorial, we briefly introduce the main ideas behind convolutional neural networks, build a neural network model, and finally explain the classification decision process using attentive response maps." + "In this tutorial, we briefly introduce the main ideas behind convolutional neural networks, build a neural network model with Keras, and explain the classification decision process using attentive response maps." ] }, { @@ -203,10 +203,10 @@ "\n", "\n", "A typical layer of a convolutional network consists of three stages:\n", - "1. **Convolution** stage: the layer performs several convolutions in parallel to produce a set of linear activations. \n", + "1. **Convolution** stage: the layer performs several convolutions in parallel to produce a set of linear activations (see Sec. 3 for more details).\n", "\n", "2. **Detector** stage: each linear activation is run through a nonlinear activation function (e.g. rectified linear \n", - "activation function)\n", + "activation function, sigmoid or tanh function)\n", "\n", "3. **Pooling** stage: a pooling function is used to modify (downsample) the output of the layer. A pooling function replaces the output of the network at a certain location with a summary statistic of the nearby outputs. For example, the max pooling operation reports the maximum output within a rectangular neighborhood. Other popular pooling functions include the average of a rectangular neighborhood, the $L^2$ norm of a rectangular neighborhood, or a weighted average based on the distance from the central pixel.\n", "\n", @@ -357,7 +357,9 @@ { "cell_type": "code", "execution_count": 3, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "# this can be skipped because the images are already saved on the server\n", @@ -386,7 +388,8 @@ "metadata": { "code_folding": [ 7 - ] + ], + "collapsed": true }, "outputs": [], "source": [ @@ -428,7 +431,9 @@ { "cell_type": "code", "execution_count": 5, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "# read jpg files as numpy arrays\n", @@ -511,7 +516,9 @@ { "cell_type": "code", "execution_count": 6, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "k_identity = np.array([[0., 0., 0.], \n", @@ -558,7 +565,9 @@ { "cell_type": "code", "execution_count": 7, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "kernels = [k_identity, k_box_blur, k_vlines, k_hlines, k_edges, k_emboss]\n", @@ -1223,7 +1232,9 @@ { "cell_type": "code", "execution_count": 20, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "from vis.visualization import visualize_saliency\n", @@ -1437,16 +1448,18 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python [conda env:py37]", "language": "python", - "name": "python3" + "name": "conda-env-py37-py" }, "language_info": { "codemirror_mode": { diff --git a/conv-nn-bigmax-2019/bigmax_mnist_example_tutorial.slides.html b/conv-nn-bigmax-2019/bigmax_mnist_example_tutorial.slides.html deleted file mode 100644 index 35bf50922cc06a7c41872c45c21875a363d1e4bd..0000000000000000000000000000000000000000 --- a/conv-nn-bigmax-2019/bigmax_mnist_example_tutorial.slides.html +++ /dev/null @@ -1,13632 +0,0 @@ - - - - - - - - - - - -bigmax_mnist_example_tutorial slides - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
-
-
-
-
-
-

# (Convolutional) Neural network tutorial - BigMax workshop - Dresden, April 2019¶

##### Authors: Angelo Ziletti, Andreas Leitherer, and Luca M. Ghiringhelli - Fritz Haber Institute of the Max Planck Society, Berlin¶

In this tutorial, we briefly introduce the main ideas behind convolutional neural networks, build a neural network model, and finally explain the classification decision process using attentive response maps.

- -
-
-
-
-
-
-
-

## 0. Install packages needed¶

-
-
-
-
-
-
-
-

We first install the packages that we will need to perform this tutorial, and then we load the necessary Python libraries. This tutorial has been tested on Python 3.5.

- -
-
-
-
-
-
In :
-
-
-
# packages to build convolutional neural networks (and not only)
-! pip install --user tensorflow
-! pip install --user keras
-
-# to visualize images
-! pip install matplotlib
-
-# to calculate convolution
-! pip install scipy
-! pip install numpy
-
-# package for neural network attention map visualization
-! pip install git+https://github.com/raghakot/keras-vis.git -U
-
- -
-
-
- -
-
- - -
- -
- - -
-
Requirement already satisfied: tensorflow in /home/ziletti/.local/lib/python3.6/site-packages (1.12.0)
-Requirement already satisfied: tensorboard<1.13.0,>=1.12.0 in /home/ziletti/.local/lib/python3.6/site-packages (from tensorflow) (1.12.2)
-Requirement already satisfied: keras-preprocessing>=1.0.5 in /home/ziletti/.local/lib/python3.6/site-packages (from tensorflow) (1.0.9)
-Requirement already satisfied: wheel>=0.26 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from tensorflow) (0.31.1)
-Requirement already satisfied: termcolor>=1.1.0 in /home/ziletti/.local/lib/python3.6/site-packages (from tensorflow) (1.1.0)
-Requirement already satisfied: astor>=0.6.0 in /home/ziletti/.local/lib/python3.6/site-packages (from tensorflow) (0.7.1)
-Requirement already satisfied: numpy>=1.13.3 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from tensorflow) (1.15.1)
-Requirement already satisfied: keras-applications>=1.0.6 in /home/ziletti/.local/lib/python3.6/site-packages (from tensorflow) (1.0.7)
-Requirement already satisfied: grpcio>=1.8.6 in /home/ziletti/.local/lib/python3.6/site-packages (from tensorflow) (1.18.0)
-Requirement already satisfied: absl-py>=0.1.6 in /home/ziletti/.local/lib/python3.6/site-packages (from tensorflow) (0.7.0)
-Requirement already satisfied: protobuf>=3.6.1 in /home/ziletti/.local/lib/python3.6/site-packages (from tensorflow) (3.6.1)
-Requirement already satisfied: gast>=0.2.0 in /home/ziletti/.local/lib/python3.6/site-packages (from tensorflow) (0.2.2)
-Requirement already satisfied: six>=1.10.0 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from tensorflow) (1.11.0)
-Requirement already satisfied: markdown>=2.6.8 in /home/ziletti/.local/lib/python3.6/site-packages (from tensorboard<1.13.0,>=1.12.0->tensorflow) (3.0.1)
-Requirement already satisfied: werkzeug>=0.11.10 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from tensorboard<1.13.0,>=1.12.0->tensorflow) (0.14.1)
-Requirement already satisfied: h5py in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from keras-applications>=1.0.6->tensorflow) (2.8.0)
-Requirement already satisfied: setuptools in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from protobuf>=3.6.1->tensorflow) (40.2.0)
-twisted 18.7.0 requires PyHamcrest>=1.9.0, which is not installed.
-You are using pip version 10.0.1, however version 19.0.2 is available.
-Requirement already satisfied: keras in /home/ziletti/.local/lib/python3.6/site-packages (2.2.4)
-Requirement already satisfied: keras-preprocessing>=1.0.5 in /home/ziletti/.local/lib/python3.6/site-packages (from keras) (1.0.9)
-Requirement already satisfied: keras-applications>=1.0.6 in /home/ziletti/.local/lib/python3.6/site-packages (from keras) (1.0.7)
-Requirement already satisfied: six>=1.9.0 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from keras) (1.11.0)
-Requirement already satisfied: numpy>=1.9.1 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from keras) (1.15.1)
-Requirement already satisfied: pyyaml in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from keras) (3.13)
-Requirement already satisfied: scipy>=0.14 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from keras) (1.1.0)
-Requirement already satisfied: h5py in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from keras) (2.8.0)
-twisted 18.7.0 requires PyHamcrest>=1.9.0, which is not installed.
-You are using pip version 10.0.1, however version 19.0.2 is available.
-Requirement already satisfied: matplotlib in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (2.2.3)
-Requirement already satisfied: numpy>=1.7.1 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from matplotlib) (1.15.1)
-Requirement already satisfied: cycler>=0.10 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from matplotlib) (0.10.0)
-Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from matplotlib) (2.2.0)
-Requirement already satisfied: python-dateutil>=2.1 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from matplotlib) (2.7.3)
-Requirement already satisfied: pytz in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from matplotlib) (2018.5)
-Requirement already satisfied: six>=1.10 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from matplotlib) (1.11.0)
-Requirement already satisfied: kiwisolver>=1.0.1 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from matplotlib) (1.0.1)
-Requirement already satisfied: setuptools in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from kiwisolver>=1.0.1->matplotlib) (40.2.0)
-twisted 18.7.0 requires PyHamcrest>=1.9.0, which is not installed.
-You are using pip version 10.0.1, however version 19.0.2 is available.
-Requirement already satisfied: scipy in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (1.1.0)
-twisted 18.7.0 requires PyHamcrest>=1.9.0, which is not installed.
-You are using pip version 10.0.1, however version 19.0.2 is available.
-Requirement already satisfied: numpy in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (1.15.1)
-twisted 18.7.0 requires PyHamcrest>=1.9.0, which is not installed.
-You are using pip version 10.0.1, however version 19.0.2 is available.
-Collecting git+https://github.com/raghakot/keras-vis.git
-  Cloning https://github.com/raghakot/keras-vis.git to /tmp/pip-req-build-epbr_4s0
-Requirement not upgraded as not directly required: keras in /home/ziletti/.local/lib/python3.6/site-packages (from keras-vis==0.4.1) (2.2.4)
-Requirement not upgraded as not directly required: six in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from keras-vis==0.4.1) (1.11.0)
-Requirement not upgraded as not directly required: scikit-image in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from keras-vis==0.4.1) (0.14.0)
-Requirement not upgraded as not directly required: matplotlib in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from keras-vis==0.4.1) (2.2.3)
-Requirement not upgraded as not directly required: h5py in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from keras-vis==0.4.1) (2.8.0)
-Requirement not upgraded as not directly required: keras-preprocessing>=1.0.5 in /home/ziletti/.local/lib/python3.6/site-packages (from keras->keras-vis==0.4.1) (1.0.9)
-Requirement not upgraded as not directly required: scipy>=0.14 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from keras->keras-vis==0.4.1) (1.1.0)
-Requirement not upgraded as not directly required: keras-applications>=1.0.6 in /home/ziletti/.local/lib/python3.6/site-packages (from keras->keras-vis==0.4.1) (1.0.7)
-Requirement not upgraded as not directly required: numpy>=1.9.1 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from keras->keras-vis==0.4.1) (1.15.1)
-Requirement not upgraded as not directly required: pyyaml in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from keras->keras-vis==0.4.1) (3.13)
-Requirement not upgraded as not directly required: networkx>=1.8 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from scikit-image->keras-vis==0.4.1) (2.1)
-Requirement not upgraded as not directly required: pillow>=4.3.0 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from scikit-image->keras-vis==0.4.1) (5.2.0)
-Requirement not upgraded as not directly required: PyWavelets>=0.4.0 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from scikit-image->keras-vis==0.4.1) (1.0.0)
-Requirement not upgraded as not directly required: dask[array]>=0.9.0 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from scikit-image->keras-vis==0.4.1) (0.19.1)
-Requirement not upgraded as not directly required: cloudpickle>=0.2.1 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from scikit-image->keras-vis==0.4.1) (0.5.5)
-Requirement not upgraded as not directly required: cycler>=0.10 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from matplotlib->keras-vis==0.4.1) (0.10.0)
-Requirement not upgraded as not directly required: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from matplotlib->keras-vis==0.4.1) (2.2.0)
-Requirement not upgraded as not directly required: python-dateutil>=2.1 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from matplotlib->keras-vis==0.4.1) (2.7.3)
-Requirement not upgraded as not directly required: pytz in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from matplotlib->keras-vis==0.4.1) (2018.5)
-Requirement not upgraded as not directly required: kiwisolver>=1.0.1 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from matplotlib->keras-vis==0.4.1) (1.0.1)
-Requirement not upgraded as not directly required: decorator>=4.1.0 in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from networkx>=1.8->scikit-image->keras-vis==0.4.1) (4.3.0)
-Requirement not upgraded as not directly required: toolz>=0.7.3; extra == "array" in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from dask[array]>=0.9.0->scikit-image->keras-vis==0.4.1) (0.9.0)
-Requirement not upgraded as not directly required: setuptools in /home/ziletti/anaconda2/envs/py36/lib/python3.6/site-packages (from kiwisolver>=1.0.1->matplotlib->keras-vis==0.4.1) (40.2.0)
-Building wheels for collected packages: keras-vis
-  Running setup.py bdist_wheel for keras-vis ... done
-  Stored in directory: /tmp/pip-ephem-wheel-cache-3ikp3hx6/wheels/c5/ae/e7/b34d1cb48b1898f606a5cce08ebc9521fa0588f37f1e590d9f
-Successfully built keras-vis
-twisted 18.7.0 requires PyHamcrest>=1.9.0, which is not installed.
-Installing collected packages: keras-vis
-  Found existing installation: keras-vis 0.4.1
-    Uninstalling keras-vis-0.4.1:
-      Successfully uninstalled keras-vis-0.4.1
-Successfully installed keras-vis-0.4.1
-You are using pip version 10.0.1, however version 19.0.2 is available.
-
-
-
- -
-
- -
-
-
-
In :
-
-
-
from __future__ import print_function
-%matplotlib inline
-
-import keras
-from keras.datasets import mnist
-from keras.models import Sequential
-from keras.layers import Dense, Dropout, Flatten
-from keras.layers import Conv2D, MaxPooling2D
-from keras import backend as K
-import matplotlib
-import matplotlib.pyplot as plt
-import numpy as np
-from scipy import signal
-import scipy.misc
-import urllib.request
-
- -
-
-
- -
-
- - -
- -
- - -
-
Using TensorFlow backend.
-
-
-
- -
-
- -
-
-
-
-
-

## 1. Introduction to Convolutional Neural Networks¶

This introduction is mainly taken from Ref. , to which we refer the interested reader for more details.

- -
-
-
-
-
-
-
-

Convolutional networks are a specialized kind of neural network for processing data that has a known grid-like topology; they are networks that use convolution in place of general matrix multiplication in at least one of their layers.

-

Examples of such data include time-series data (1-D grid with samples at regular time intervals) and image data (2-D grid of pixels).
-Convolutional networks have been tremendously successful in practical applications, especially in computer vision.

-

The name "convolutional neural network" indicates that the network employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation.

-

A typical layer of a convolutional network consists of three stages:

-
-
1. Convolution stage: the layer performs several convolutions in parallel to produce a set of linear activations.

-
2. -
3. Detector stage: each linear activation is run through a nonlinear activation function (e.g. rectified linear -activation function)

-
4. -
5. Pooling stage: a pooling function is used to modify (downsample) the output of the layer. A pooling function replaces the output of the network at a certain location with a summary statistic of the nearby outputs. For example, the max pooling operation reports the maximum output within a rectangular neighborhood. Other popular pooling functions include the average of a rectangular neighborhood, the $L^2$ norm of a rectangular neighborhood, or a weighted average based on the distance from the central pixel.

-
6. -
-

#### Max pooling example¶ -Figure from http://cs231n.github.io/convolutional-networks/

-

#### Average pooling example¶ -Figure from https://github.com/vdumoulin/conv_arithmetic

- -
-
-
-
-
-
-
-

### 2. Motivation¶

Why one should use convolutional neural networks instead of simple (fully connected) neural networks?

-

Convolution leverages three important ideas that can help improve a machine learning system:

-
-
• sparse interactions
• -
• parameter sharing
• -
• equivariant representations
• -
-

Moreover, convolution provides a means for working with inputs of variable size - while this is not possible with fully connected neural networks (also called multi-layer perceptrons).

-

#### 2.1 Sparse interactions¶

##### Fully connected NN¶

It uses matrix multiplication by a matrix of parameters with a separate parameter describing the interaction between each input unit and each output unit. This means that every output unit interacts with every input unit. This do not scale well to full images. For example, an image of 200x200x3 would lead to neurons that have 200x200x3 = 120,000 weights. Moreover, we would almost certainly want to have several such neurons. Clearly, this full connectivity is wasteful and the huge number of parameters would quickly lead to overfitting.

-
##### CNN¶

It achieves sparse interactions (sparse connectivity) by making the kernel smaller than the input. When processing an image, we can detect small, meaningful features such as edges with kernels that occupy only tens or hundreds of pixels. (see Sec. 3.3.2 for two concrete examples).
-This means that we need to store fewer parameters, which both reduces the memory requirements of the model and improves its statistical efficiency. It also means that computing the output requires fewer operations. If there are $m$ inputs and $n$ outputs, then matrix multiplication requires $m \times n$ parameters, and the algorithms used in practice have $O(m \times n)$ runtime (per example). If we limit the number of connections each output may have to $k$, then the sparsely connected approach requires only $k \times n$ parameters and $O(k \times n)$ runtime. For many practical applications, $k$ is several orders of magnitude smaller than $m$.

-

#### 2.2 Parameter sharing¶

It refers to using the same parameter for more than one function in a model.

-
##### Fully connected NN¶

Each element of the weight matrix is used exactly once when computing the output of a layer.

-
##### CNN¶

Each member of the kernel is used at every position of the input. The parameter sharing used by the convolution operation means that rather than learning a separate set of parameters for every location, we learn only one set. This further reduce the storage requirements of the model to $k$ parameters. Recall that $k$ is usually several orders of magnitude smaller than $m$. Since $m$ and $n$ are usually roughly the same size, $k$ is practically insignificant compared to $m \times n$. Convolution is thus dramatically more efficient than dense matrix multiplication in terms of the memory requirements and statistical efficiency.

-

#### 2.3 Equivariant representations¶

Parameter sharing causes the layer to have equivariance to translation. To say a function is equivariant means that if the input changes, the output changes in the same way.

-

When processing time-series data, this means that convolution produces a sort of timeline that shows when different features appear in the input. If we move an event later in time in the input, the exact same representation of it will appear in the output, just later. Similarly with images, convolution creates a 2-D map of where certain features appear in the input. If we move the object in the input, its representation will move the same amount in the output. This is useful for when we know that some function of a small number of neighboring pixels is useful when applied to multiple input locations.

- -
-
-
-
-
-
-
-

## 3. The convolution operation¶

### 3.1 Summary and intuition¶

The convolutional layer's parameters consist of a set of learnable filters. Every filter is small spatially (along width and height), but extends through the full depth of the input volume. For example, a typical filter on a first layer of a ConvNet might have size 5x5x3 (i.e. 5 pixels width and height, and 3 because images have depth 3, the color channels).

-
-
• During the forward pass, we slide (more precisely, convolve) each filter across the width and height of the input volume and compute dot products between the entries of the filter and the input at any position. Intuitively, a convolution can be thought as a sliding (weigthed) average.

-
• -
• As we slide the filter over the width and height of the input volume we will produce a 2-dimensional activation map that gives the responses of that filter at every spatial position. Intuitively, the network will learn filters that activate when they see some type of visual feature such as an edge of some orientation or a blotch of some color on the first layer, or eventually entire honeycomb or wheel-like patterns on higher layers of the network.

-
• -
• At this stage, we have an entire set of filters in each convolutional layer (e.g. 12 filters), and each of them produce a separate 2-dimensional activation map. We stack these activation maps along the depth dimension and produce the output volume.

-
• -
-

Below, you can see a representation on how the convolution operation is performed. - - - -
-
-
-
-
-
-
-

### 3.2 Mathematical formulation - from Ref. ¶

#### Main idea¶

Suppose we are tracking the location of a spaceship with a laser sensor. Our laser sensor provides a single output $x(t)$, the position of the spaceship at time $t$. Now suppose that our laser sensor is somewhat noisy. To obtain a less noisy estimate of the spaceship’s position, we would like to average several measurements. Of course, more recent measurements are more relevant, so we will want this to be a weighted average that gives more weight to recent measurements. We can do this with a weighting function $w(a)$, where $a$ is the age of a measurement.
-If we apply such a weighted average operation at every moment, we obtain a new function $s$ providing a smoothed estimate of the position of the spaceship:

-

$s(t) = \int x(a)w(t− a)da$

-

This operation is called convolution.

-

The convolution operation is typically denoted with an asterisk:

-

$s(t) = ( x ∗ w )( t )$

-

In convolutional network terminology, the first argument (in this example, the function $x$) to the convolution is often referred to as the input, and the second argument (in this example, the function $w$) as the kernel. The output is sometimes referred to as the feature map.

-

#### Discrete version - 1D [optional]¶

Let us assume that time index $t$ can then take on only integer values. If we now assume that $x$ and $w$ are defined only on integer $t$, we can define the discrete convolution:

-

$s(t) = ( x ∗ w )( t ) = \sum_{a=-\infty}^{+\infty} x(a)w(t− a)$

-

#### Discrete version - 2D [optional]¶

$S(i,j) = (I ∗ k)(i,j) = \sum_{m}\sum_{n} I(m,n)K(i-m,j-n)$

-

Convolution is commutative, so we can write:

-

$S(i,j) = (K ∗ I)(i,j) = \sum_{m}\sum_{n} I(i-m,j-n)K(m,n)$

-

Usually the latter formula is more straightforward to implement in a machine learning library, because there is less variation in the range of valid values of $m$ and $n$. The commutative property of convolution arises because we have flipped the kernel relative to the input, in the sense that as $m$ increases, the index into the input increases, but the index into the kernel decreases. The only reason to flip the kernel is to obtain the commutative property. While the commutative property is useful for writing proofs, it is not usually an important property of a neural network implementation.

-

Instead, many neural network libraries implement a related function called the cross-correlation, which is the same as convolution but without flipping the kernel:

-

$S(i,j) = (I ∗ K)(i,j) = \sum_{m}\sum_{n} I(i+m,j+n)K(m,n)$

-

Many machine learning libraries implement cross-correlation but call it convolution. In the context of machine learning, the learning algorithm will learn the appropriate values of the kernel in the appropriate place, so an algorithm based on convolution with kernel flipping will learn a kernel that is flipped relative to the kernel learned by an algorithm without the flipping.

- -
-
-
-
-
-
-
-

## 3.3 Examples¶

### 3.3.1 Example: computing output value of a discrete convolution (from Ref. )¶

We present below the calculation of the discrete convolution of a 3x3 kernel $K_{\rm ex}$ (with no padding and stride 1):
-$K_{\rm ex} = \begin{pmatrix} -0 & 1 & 2 \\ -2 & 2 & 0 \\ -0 & 1 & 2 -\end{pmatrix}$

- - -
-
-
-
-
-
-
-

### 3.3.2 Example: convolution in practice on real images¶

-
-
-
-
-
-
-
-

We now perform a convolution operation on real images. We use a photo of Max Planck, and a Berlin landscape.

- -
-
-
-
-
-
In :
-
-
-
# this can be skipped because the images are already saved on the server
-
-# retrieve image of Max Planck from wikipedia
-#print("Retrieving picture of Max Planck. Saving image to './img_max_planck.jpg'.")
-
-# retrive a picture of Berlin
-#print("Retrieving picture of Berlin landscape. Saving image to './img_berlin_landscape.jpg'.")
-#urllib.request.urlretrieve("http://vivalifestyleandtravel.com/images/cache/c-1509326560-44562570.jpg", "./img_berlin_landscape.jpg")
-
-#print("Done.")
-
- -
-
-
- -
-
-
-
-
-

We define a function to display images in a single figure; it is not important for the purpose of this tutorial to understand this function implementation.

- -
-
-
-
-
-
In :
-
-
-
# function to display multiple images in a single figure
-def show_images(images, cols=1, titles=None, cmap='viridis', filename_out=None):
-    """Display a list of images in a single figure with matplotlib.
-
-    Taken from https://stackoverflow.com/questions/11159436/multiple-figures-in-a-single-window
-
-    Parameters:
-
-    images: list of np.arrays
-        Images to be plotted. It must be compatible with plt.imshow.
-
-    cols: int,  optional, (default = 1)
-        Number of columns in figure (number of rows is
-        set to np.ceil(n_images/float(cols))).
-
-    titles: list of strings
-        List of titles corresponding to each image.
-
-    """
-    plt.clf()
-    assert ((titles is None) or (len(images) == len(titles)))
-    n_images = len(images)
-    if titles is None:
-        titles = ['Image (%d)' % i for i in range(1, n_images + 1)]
-    fig = plt.figure()
-    for n, (image, title) in enumerate(zip(images, titles)):
-        a = fig.add_subplot(cols, np.ceil(n_images / float(cols)), n + 1)
-        plt.imshow(image, cmap=cmap)
-        a.set_title(title, fontsize=40)
-        a.axis('off')  # clear x- and y-axes
-    fig.set_size_inches(np.array(fig.get_size_inches()) * n_images)
-    if filename_out is not None:
-        plt.savefig(filename_out, dpi=100, format='png')
-
- -
-
-
- -
-
-
-
In :
-
-
-
# read jpg files as numpy arrays
-
- -
-
-
- -
-
-
-
-
-

#### Type of Kernels¶

In Sec. 3.3.1, we used a randomly chosen matrix to perform our convolution; it turns out that there are some "special" kernel matrices that perform specific (and useful) transformation when convoluted with an image. -Below, we present some example of these kernels.

-

-

$K_{\rm identity } = \begin{pmatrix} -0 & 0 & 0 \\ -0 & 1 & 0 \\ -0 & 0 & 0 -\end{pmatrix}$

-

$K_{ \rm boxblur} = \dfrac{1}{9}\begin{pmatrix} -1 & 1 & 1 \\ -1 & 1 & 1 \\ -1 & 1 & 1 -\end{pmatrix}$

-

$K_{\rm gaussianblur3x3} = \dfrac{1}{16}\begin{pmatrix} -1 & 2 & 1 \\ -2 & 4 & 2 \\ -1 & 2 & 1 -\end{pmatrix}$

-

$K_{\rm gaussianblur5x5} = \dfrac{1}{256}\begin{pmatrix} -1 & 4 & 6 & 4 & 1 \\ -4 & 16 & 24 & 16 & 4 \\ -6 & 24 & 36 & 24 & 6 \\ -4 & 16 & 24 & 16 & 4 \\ -1 & 4 & 6 & 4 & 1 -\end{pmatrix}$

-

$K_{\rm vlines} = \begin{pmatrix} --1 & 2 & -1 \\ --1 & 2 & -1 \\ --1 & 2 & -1 -\end{pmatrix}$

-

$K_{\rm hlines} = \begin{pmatrix} --1 & -1 & -1 \\ - 2 & 2 & 2 \\ --1 & -1 & -1 -\end{pmatrix}$

-

$K_{\rm edges} = \begin{pmatrix} --1 & -1 & -1 \\ --1 & 8 & -1 \\ --1 & -1 & -1 -\end{pmatrix}$

-

$K_{\rm emboss} = \begin{pmatrix} --2 & -1 & 0 \\ --1 & 1 & 1 \\ - 0 & 1 & 2 -\end{pmatrix}$

- -
-
-
-
-
-
-
-

Now we apply the convolution operation on both images (photo of Max Planck and the Berlin landscape) using each of the kernel above. -In particular, we use the Scipy function signal.convolve2d to perform the convolution.

-

Please refer to the Scipy documentation for more details on this function: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.convolve2d.html

- -
-
-
-
-
-
In :
-
-
-
k_identity = np.array([[0., 0., 0.],
-                       [0., 1., 0.],
-                       [0., 0., 0.]])
-
-k_box_blur = 1./9. * np.array([[1., 1., 1.],
-                              [1., 1., 1.],
-                              [1., 1., 1.]])
-
-k_gauss_blur_3x3 = 1./16.* np.array([[0., 0., 0., 5., 0., 0., 0.],
-                              [0., 0., 18., 32., 18., 5., 0.],
-                              [0., 18., 64., 100., 64., 18., 0.],
-                              [5., 32., 100., 100., 100., 32., 5.],
-                              [0., 18., 64., 100., 64., 18., 0.],
-                              [0., 5., 18., 32., 18., 5., 0.],
-                              [0., 0., 0., 5., 0., 0., 0.]])
-
-k_gauss_blur_5x5 = 1./256.* np.array([[1.,  4.,  6.,  4., 1.],
-                                      [4., 16., 24., 16., 4.],
-                                      [6., 24., 36., 24., 6.],
-                                      [4., 16., 24., 16., 4.],
-                                      [1.,  4.,  6.,  4., 1.]])
-
-k_vlines = np.array([[-1., 2., -1.],
-                     [-1., 2., -1.],
-                     [-1., 2., -1.]])
-
-k_hlines = np.array([[-1., -1., -1.],
-                     [ 2.,  2.,  2.],
-                     [-1., -1., -1.]])
-
-k_edges = np.array([[-1., -1., -1.],
-                    [-1.,  8., -1.],
-                    [-1., -1., -1.]])
-
-#the emboss kernel givens the illusion of depth by emphasizing the differences of pixels in a given direction
-# in this case, in a direction along a line from the top left to the bottom right.
-k_emboss = np.array([[-2., -1., 0.],
-                     [-1.,  1., 1.],
-                     [ 0.,  1., 2.]])
-
- -
-
-
- -
-
-
-
In :
-
-
-
kernels = [k_identity, k_box_blur, k_vlines, k_hlines, k_edges, k_emboss]
-titles = ['original', 'box blur', 'vertical lines', 'horizontal lines', 'edges', 'emboss']
-
-# now apply the convolution for each kernel above
-max_planck_feature_maps = []
-berlin_landscape_feature_maps = []
-for kernel in kernels:
-    max_planck_feature_maps.append(signal.convolve2d(img_max_planck, kernel, boundary='symm', mode='same'))
-    berlin_landscape_feature_maps.append(signal.convolve2d(img_berlin_landscape, kernel, boundary='symm', mode='same'))
-
- -
-
-
- -
-
-
-
In :
-
-
-
show_images(images=max_planck_feature_maps, cols=2, titles=titles, cmap='gray')
-
- -
-
-
- -
-
- - -
- -
- - - - -
-
<Figure size 432x288 with 0 Axes>
-
- -
- -
- -
- - - - -
- -
- -
- -
-
- -
-
-
-
In :
-
-
-
show_images(images=berlin_landscape_feature_maps, cols=2, titles=titles, cmap='gray')
-
- -
-
-
- -
-
- - -
- -
- - - - -
-
<Figure size 432x288 with 0 Axes>
-
- -
- -
- -
- - - - -
-