Welcome to my Hello World of Backpropagation.
This is my tiny and straightforward Haskell implementation of a basic neural net using gradient descent. I coded this from scratch, along reading the first two chapters of Neural Networks and Deep Learning .
Be warned that the following implementation aims at clarity and readability, but not performance! In another article I will probably discuss, how to optimize it heavily, utilizing Parallel Programming / Tensor Flow (CUDA). We can even run it on a cluster someday…
The source code below was auto-fetched from: https://gitweb.softwarefools.com/?p=miguel/haskell.git;a=blob;f=mnist/Neuronet.hs
module Neuronet ( Neuronet -- the neuronet ,newnet -- initalize neuronet ,train -- train with batch ,asknet -- ask the neuroal net )where import Data.List import Numeric.LinearAlgebra (Matrix,Vector,tr,scale,cmap,(#>),randn, toList,fromList,toLists,fromLists,outer) -- | A layer of our network consists of a weight matrix with input -- weights and a vector holding the bias at each neuron. type Layer = (Matrix Double,Vector Double) -- | Our neural network is simply a list of layers type Neuronet = [Layer] -- | Initialize a fresh neuronal network given the number of neurons on -- each layer, as a list. Weights and biases are initialized randomly -- using gaussian distribution with mean 0 and standard deviation 1. newnet :: [Int] -> IO Neuronet newnet l = mapM nl $ zip l (tail l) where nl (i,l) = (,) <$> randn l i <*> (randn 1 l >>= return.fromList.head.toLists) -- | Given the input vector calculate the `weighted inputs` and -- `activations` for all layers of our network. wghtact :: Neuronet -> Vector Double -> [(Vector Double,Vector Double)] wghtact  _ =  wghtact ((w,b):lx) x = (z,a):wghtact lx a where z = w #> x + b a = cmap sigmoid z -- | Given the input vector calculate the final output asknet :: Neuronet -> Vector Double -> Vector Double asknet net x = snd . last $ wghtact net x -- | Given the input and output vectors calculate the gradient of our -- cost function, utilizing backpropagation (output list by layer and -- split in the weight and bias partial derivatives respectively). -- Keep the required assumptions about the cost function in mind! backprop :: Neuronet -> Vector Double -> Vector Double -> [(Matrix Double,Vector Double)] backprop net x y = zipWith (\a e->(outer e a,e)) (x:map snd wa) (go $ zip ws wa) where ws = (++[fromLists ]) . tail . map fst $ net wa = wghtact net x go [(w,(z,a))] = [cost_derivative a y * cmap sigmoid' z] go ((w,(z,a)):lx) =let r@(e:_)=go lx in tr w #> e * cmap sigmoid' z:r -- | Sigmoid function sigmoid :: Double -> Double sigmoid x = 1/(1+exp(-x)) -- | Derivative of sigmoid function sigmoid' :: Double -> Double sigmoid' x = sigmoid x * (1-sigmoid x) -- | Returs vector of partial derivatives of the cost function cost_derivative :: Vector Double -> Vector Double -> Vector Double cost_derivative a y = a-y -- | Train on a batch of samples train :: Double -> Neuronet -> [Vector Double] -> [Vector Double] -> Neuronet train r net xs ys = zipWith (upd r) net bp where bp = foldl1' fc $ map (uncurry $ backprop net) (zip xs ys) fc v a = zipWith ff v a ff (a,b) (c,d) = (a+c,b+d) -- | Update a single Layer given the `direction` and `training rate` upd :: Double -> Layer -> (Matrix Double,Vector Double) -> Layer upd r (a,b) (c,d) = (a-scale r c,b-scale r d)
Even this simple vanilla network can already give us some tangible results, when run on the well-known MNIST Database  of handwritten digits, which contains a set of 60.000 training samples and another set of 10.000 test samples.
The following screencast shows the success rates of the simple network for the first ten epochs. (The speed of the cast was increased quite a bit)
A loose collection of notes and some important terms, you should make yourself familiar with, when learning about neural networks.