#### 7.1 Units

###### What is a computational unit in a neural network?

A computation unit takes a set of vectors as input, performs some computation, and produces an output. In the simplest form, the unit is taking a weighted sum of its inputs with an additional bias term. The output of the unit (z) is feed into an activation function (a non-linear function) to compute the final output of the network.

###### What are the three popular non-linear functions?
1. Sigmoid

2. Tanh

3. ReLU

Sigmoid maps the output z into a range of 0 – 1. This is useful for dealing with outliers which are forced to be within the range. Sigmoid is differentiable. The whole flow from input to computation unit (with activation function) to output is shown below.

A similar activation function that almost always perform better is the tanh function. Tanh is a variant of the sigmoid and has range between -1 and 1. The simplest and most commonly used activation function is the ReLU. ReLU is the max between 0 and x. Both Tanh and ReLU activation function is shown below.

#### 7.2 The XOR problem

###### What’s the need for multi-layer network?

A popular demonstrations was the use of AND, OR, and XOR problem, where it was shown that a single neural unit (a perceptron without activation function) can compute AND and OR outputs but it’s impossible to compute XOR outputs. This limitation is due to the fact that a perceptron is a linear classifier whereas XOR is not a linearly separable function. To compute XOR, we would need to have neural networks and have more than one computational unit with non-linear activation. A network with simple linear units won’t be able to compute XOR problem.

#### 7.3 Feed-Forward Neural Networks

###### What is a FFNN?

A FFNN is a multi-layer network where units are connected and the outputs from units in each layer are passed to units in higher layer. A FFNN is made up of three nodes as shown below:

1. Input units

2. Hidden units

3. Output units

###### What’s the use of softmax function?

It is use to create a probability distribution from an output vector of real-valued numbers

Data Scientist