7.1 Units
What is a computational unit in a neural network?
A computation unit takes a set of vectors as input, performs some computation, and produces an output. In the simplest form, the unit is taking a weighted sum of its inputs with an additional bias term. The output of the unit (z) is feed into an activation function (a nonlinear function) to compute the final output of the network.
What are the three popular nonlinear functions?

Sigmoid

Tanh

ReLU
Sigmoid maps the output z into a range of 0 – 1. This is useful for dealing with outliers which are forced to be within the range. Sigmoid is differentiable. The whole flow from input to computation unit (with activation function) to output is shown below.
A similar activation function that almost always perform better is the tanh function. Tanh is a variant of the sigmoid and has range between 1 and 1. The simplest and most commonly used activation function is the ReLU. ReLU is the max between 0 and x. Both Tanh and ReLU activation function is shown below.
7.2 The XOR problem
What’s the need for multilayer network?
A popular demonstrations was the use of AND, OR, and XOR problem, where it was shown that a single neural unit (a perceptron without activation function) can compute AND and OR outputs but it’s impossible to compute XOR outputs. This limitation is due to the fact that a perceptron is a linear classifier whereas XOR is not a linearly separable function. To compute XOR, we would need to have neural networks and have more than one computational unit with nonlinear activation. A network with simple linear units won’t be able to compute XOR problem.
7.3 FeedForward Neural Networks
What is a FFNN?
A FFNN is a multilayer network where units are connected and the outputs from units in each layer are passed to units in higher layer. A FFNN is made up of three nodes as shown below:

Input units

Hidden units

Output units
What’s the use of softmax function?
It is use to create a probability distribution from an output vector of realvalued numbers