www.idziorek.net | blog | contact
March 2019

# Neural Networks

## Hello Sigmoid

Welcome to my Hello World of Backpropagation.

This is my tiny and straightforward Haskell implementation of a basic neural net using gradient descent. I coded this from scratch, along reading the first two chapters of Neural Networks and Deep Learning .

Be warned that the following implementation aims at clarity and readability, but not performance! In another article I will probably discuss, how to optimize it heavily, utilizing Parallel Programming / Tensor Flow (CUDA). We can even run it on a cluster someday…

The source code below was auto-fetched from: https://gitweb.softwarefools.com/?p=miguel/haskell.git;a=blob;f=mnist/Neuronet.hs

module Neuronet
( Neuronet      -- the neuronet
,newnet        -- initalize neuronet
,train         -- train with batch
)where

import Data.List
import Numeric.LinearAlgebra (Matrix,Vector,tr,scale,cmap,(#>),randn,
toList,fromList,toLists,fromLists,outer)

-- | A layer of our network consists of a weight matrix with input
-- weights and a vector holding the bias at each neuron.
type Layer    = (Matrix Double,Vector Double)

-- | Our neural network is simply a list of layers
type Neuronet = [Layer]

-- | Initialize a fresh neuronal network given the number of neurons on
-- each layer, as a list. Weights and biases are initialized randomly
-- using gaussian distribution with mean 0 and standard deviation 1.
newnet :: [Int] -> IO Neuronet
newnet l = mapM nl $zip l (tail l) where nl (i,l) = (,) <$> randn l i <*>

-- | Given the input vector calculate the weighted inputs and
-- activations for all layers of our network.
wghtact :: Neuronet -> Vector Double -> [(Vector Double,Vector Double)]
wghtact [] _ = []
wghtact ((w,b):lx) x = (z,a):wghtact lx a where z = w #> x + b
a = cmap sigmoid z

-- | Given the input vector calculate the final output
asknet :: Neuronet -> Vector Double -> Vector Double
asknet net x = snd . last $wghtact net x -- | Given the input and output vectors calculate the gradient of our -- cost function, utilizing backpropagation (output list by layer and -- split in the weight and bias partial derivatives respectively). -- Keep the required assumptions about the cost function in mind! backprop :: Neuronet -> Vector Double -> Vector Double -> [(Matrix Double,Vector Double)] backprop net x y = zipWith (\a e->(outer e a,e)) (x:map snd wa) (go$ zip ws wa)
where ws = (++[fromLists []]) . tail . map fst $net wa = wghtact net x go [(w,(z,a))] = [cost_derivative a y * cmap sigmoid' z] go ((w,(z,a)):lx) =let r@(e:_)=go lx in tr w #> e * cmap sigmoid' z:r -- | Sigmoid function sigmoid :: Double -> Double sigmoid x = 1/(1+exp(-x)) -- | Derivative of sigmoid function sigmoid' :: Double -> Double sigmoid' x = sigmoid x * (1-sigmoid x) -- | Returs vector of partial derivatives of the cost function cost_derivative :: Vector Double -> Vector Double -> Vector Double cost_derivative a y = a-y -- | Train on a batch of samples train :: Double -> Neuronet -> [Vector Double] -> [Vector Double] -> Neuronet train r net xs ys = zipWith (upd r) net bp where bp = foldl1' fc$ map (uncurry \$ backprop net) (zip xs ys)
fc v a = zipWith ff v a
ff (a,b) (c,d) = (a+c,b+d)

-- | Update a single Layer given the direction and training rate
upd :: Double -> Layer -> (Matrix Double,Vector Double) -> Layer
upd r (a,b) (c,d) = (a-scale r c,b-scale r d)

## MNIST

Even this simple vanilla network can already give us some tangible results, when run on the well-known MNIST Database  of handwritten digits, which contains a set of 60.000 training samples and another set of 10.000 test samples. The following screencast shows the success rates of the simple network for the first ten epochs. (The speed of the cast was increased quite a bit)

I created a little javascript version with 100 random test samples where you can test the trained neuronal network inside your browser, if it supports HTML5.

## Technical Vocublary

A loose collection of notes and some important terms, you should make yourself familiar with, when learning about neural networks.

• sigmoid neurons vs. perceptrons
• recurrent neural networks
• cost / loss / objective function
• required assumptions about the cost function
• quadratic cost function / mean squared error
• cross-entropy cost function (recheck proof)
• binary entropy
• gradient (vector of partial derivatives)
• on-line / incremental learning
• deep neural networks
• mini-batches
• backpropagation (based on 4 fudnamental equations / check proofs)
• weighted input
• saturated neuron
• softmax
• log-likehood
• overfitting
• hyper-parameters
• validation data
• regularization (l1,l2,dropout)