`March 2019`

Welcome to my **Hello World** of **Backpropagation**.

This is my tiny and straightforward Haskell implementation of a basic neural net using **gradient descent**. I coded this from scratch, along reading the first two chapters of *Neural Networks and Deep Learning* [1].

Be warned that the following implementation aims at clarity and readability, but not performance! In another article I will probably discuss, how to optimize it heavily, utilizing *Parallel Programming* / *Tensor Flow (CUDA)*. We can even run it on a cluster someday…

The source code below was auto-fetched from: https://gitweb.softwarefools.com/?p=miguel/haskell.git;a=blob;f=mnist/Neuronet.hs

```
module Neuronet
( Neuronet -- the neuronet
,newnet -- initalize neuronet
,train -- train with batch
,asknet -- ask the neuroal net
)where
import Data.List
import Numeric.LinearAlgebra (Matrix,Vector,tr,scale,cmap,(#>),randn,
toList,fromList,toLists,fromLists,outer)
-- | A layer of our network consists of a weight matrix with input
-- weights and a vector holding the bias at each neuron.
type Layer = (Matrix Double,Vector Double)
-- | Our neural network is simply a list of layers
type Neuronet = [Layer]
-- | Initialize a fresh neuronal network given the number of neurons on
-- each layer, as a list. Weights and biases are initialized randomly
-- using gaussian distribution with mean 0 and standard deviation 1.
newnet :: [Int] -> IO Neuronet
newnet l = mapM nl $ zip l (tail l)
where nl (i,l) = (,) <$> randn l i <*>
(randn 1 l >>= return.fromList.head.toLists)
-- | Given the input vector calculate the `weighted inputs` and
-- `activations` for all layers of our network.
wghtact :: Neuronet -> Vector Double -> [(Vector Double,Vector Double)]
wghtact [] _ = []
wghtact ((w,b):lx) x = (z,a):wghtact lx a where z = w #> x + b
a = cmap sigmoid z
-- | Given the input vector calculate the final output
asknet :: Neuronet -> Vector Double -> Vector Double
asknet net x = snd . last $ wghtact net x
-- | Given the input and output vectors calculate the gradient of our
-- cost function, utilizing backpropagation (output list by layer and
-- split in the weight and bias partial derivatives respectively).
-- Keep the required assumptions about the cost function in mind!
backprop :: Neuronet -> Vector Double -> Vector Double -> [(Matrix Double,Vector Double)]
backprop net x y = zipWith (\a e->(outer e a,e)) (x:map snd wa) (go $ zip ws wa)
where ws = (++[fromLists []]) . tail . map fst $ net
wa = wghtact net x
go [(w,(z,a))] = [cost_derivative a y * cmap sigmoid' z]
go ((w,(z,a)):lx) =let r@(e:_)=go lx in tr w #> e * cmap sigmoid' z:r
-- | Sigmoid function
sigmoid :: Double -> Double
sigmoid x = 1/(1+exp(-x))
-- | Derivative of sigmoid function
sigmoid' :: Double -> Double
sigmoid' x = sigmoid x * (1-sigmoid x)
-- | Returs vector of partial derivatives of the cost function
cost_derivative :: Vector Double -> Vector Double -> Vector Double
cost_derivative a y = a-y
-- | Train on a batch of samples
train :: Double -> Neuronet -> [Vector Double] -> [Vector Double] -> Neuronet
train r net xs ys = zipWith (upd r) net bp
where bp = foldl1' fc $ map (uncurry $ backprop net) (zip xs ys)
fc v a = zipWith ff v a
ff (a,b) (c,d) = (a+c,b+d)
-- | Update a single Layer given the `direction` and `training rate`
upd :: Double -> Layer -> (Matrix Double,Vector Double) -> Layer
upd r (a,b) (c,d) = (a-scale r c,b-scale r d)
```

Even this simple *vanilla* network can already give us some tangible results, when run on the well-known MNIST Database [2] of handwritten digits, which contains a set of 60.000 training samples and another set of 10.000 test samples.

The following screencast shows the success rates of the simple network for the first ten epochs. (The speed of the cast was increased quite a bit)

I created a little javascript version with 100 random test samples where you can test the trained neuronal network inside your browser, if it supports HTML5.

A loose collection of notes and some important terms, you should make yourself familiar with, when learning about neural networks.

- sigmoid neurons vs. perceptrons
- recurrent neural networks
- cost / loss / objective function
- required assumptions about the cost function
- quadratic cost function / mean squared error
- cross-entropy cost function (recheck proof)
- binary entropy
- gradient descent
- gradient (vector of partial derivatives)
- stochastic gradient descent
- on-line / incremental learning
- deep neural networks
- mini-batches
- backpropagation (based on 4 fudnamental equations / check proofs)
- weighted input
- saturated neuron
- softmax
- log-likehood
- overfitting
- hyper-parameters
- validation data
- regularization (l1,l2,dropout)

- [1] http://neuralnetworksanddeeplearning.com/
- [2] http://yann.lecun.com/exdb/mnist/
- [3] http://www.deeplearningbook.org/
- [4] https://medium.com/tebs-lab/how-to-classify-mnist-digits-with-different-neural-network-architectures-39c75a0f03e3
- [5] https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html