# AI News, Implementing a Neural Network from Scratch in Python &#8211; An Introduction ## Implementing a Neural Network from Scratch in Python &#8211; An Introduction

Get the code: To follow along, all the code is also available as an iPython notebook on Github.

You can think of the blue dots as male patients and the red dots as female patients, with the x- and y- axis being medical measurements.

Our goal is to train a Machine Learning classifier that predicts the correct class (male of female) given the x- and y- coordinates.

This means that linear classifiers, such as Logistic Regression, won&#8217;t be able to fit the data unless you hand-engineer non-linear features (such as polynomials) that work well for the given dataset.

(Because we only have 2 classes we could actually get away with only one output node predicting 0 or 1, but having 2 makes it easier to extend the network to more classes later on).

The input to the network will be x- and y- coordinates and its output will be two probabilities, one for class 0 (&#8220;female&#8221;) and one for class 1 (&#8220;male&#8221;).

Because we want our network to output probabilities the activation function for the output layer will be the softmax, which is simply a way to convert raw scores to probabilities.

Our network makes predictions using forward propagation, which is just a bunch of matrix multiplications and the application of the activation function(s) we defined above.

is the input of layer and is the output of layer after applying the activation function.

If we have training examples and classes then the loss for our prediction with respect to the true labels is given by:

The formula looks complicated, but all it really does is sum over our training examples and add to the loss if we predicted the incorrect class.

The further away the two probability distributions (the correct labels) and (our predictions) are, the greater our loss will be.

We can use gradient descent to find the minimum and I will implement the most vanilla version of gradient descent, also called batch gradient descent with a fixed learning rate.

As an input, gradient descent needs the gradients (vector of derivatives) of the loss function with respect to our parameters: , , , .

To calculate these gradients we use the famous backpropagation algorithm, which is a way to efficiently calculate the gradients starting from the output.

We start by defining some useful variables and parameters for gradient descent: First let&#8217;s implement the loss function we defined above.

If we were to evaluate our model on a separate test set (and you should!) the model with a smaller hidden layer size would likely perform better due to better generalization.

Here are some things you can try to become more familiar with the code: All of the code is available as an iPython notebook on Github. Please leave questions or feedback in the comments!

Neural Network learns XOR function | Hidden Layer Output Visualization |

The code is available here: A Feed-Forward Neural network with one hidden layer has been used to learn the XOR ..

Neural Networks (1): Basics

The basic form of a feed-forward multi-layer perceptron / neural network; example activation functions.

What is a Neural Network - Ep. 2 (Deep Learning SIMPLIFIED)

With plenty of machine learning tools currently available, why would you ever choose an artificial neural network over all the rest? This clip and the next could ...

Solution of the XOR problem using back propagation and a hidden layer (a), 11/2/2015

Neural networks [2.4] : Training neural networks - hidden layer gradient

Solution of the XOR problem using back propagation and a hidden layer (c), 16/2/2015

Neural Networks 6: solving XOR with a hidden layer

XOR as Perceptron Network Quiz Solution - Georgia Tech - Machine Learning

Watch on Udacity: Check out the full Advanced Operating Systems ..

Solution of the XOR problem using back propagation and a hidden layer (b), 16/2/2015

Overview of a neural network with a hidden layer, 9/2/2015