AI News, 2. Planar data classification with a neural network with one hidden layer, an implementation from scratch

2. Planar data classification with a neural network with one hidden layer, an implementation from scratch

Since we train our model with gradient descent, we should compute gradients. To be specific, we need the following derivative of loss function over each weight: Momentum is a method that helps accelerate SGD in the relevant direction and dampens oscillations as can be seen in image below.

We also need to implement RMSPROP algorithm, which use squared gradients to adjust learning rate as follows: The following animation shows how the decision surface and the cross-entropy loss function changes with different batches with SGD + RMSProp where batch-size=4.