Notes on "Practical deep learning": foundation

While trying to launch the model, I discovered the book was outdated. Now, we took video lessons more than the book chapters.

These are the notes on Lesson 3: How Neural Networks work and how to optimize it.

How to learn?

  1. Watch the lessons
  2. (Read the chapters)
  3. Run notebooks and experiment.
  4. Reproduce results
  5. Repeat with different dataset

Run notebooks in the /clean folder from the book repo.

About exported model

The exported model '.pkl' has two things.

  1. Preprocessing steps to turn data into a model: DataLoaderpart.
  2. Trained model available in .modelparameter. It's a tree of multiple models for each neural layer. The submodules are available by model.get_submodule()method.

@interact(a=1, b=2, c=3) is the particular keyword for jupyter to make interactive parameters.

How does Neural Network work?

A neural network tries to fit a function to data. The neural network adjusts the function parameters until the function's output is not close to the data.

After adjusting the parameter, a loss function is used to see how close the function output is to the data. The mean_mean_error: ((output - data)^2).mean()is the most popular loss function.

To automate adjustment by a loss function, we could calculate the derivative. Derivative checks how much parameter value increase increases the output. And how far it is from the data. The distance from the data to the function output is called a slope or gradient.

Python tip: func(*params). The * expands the parameters into function arguments as a, b, and c.

The PyTorch library has built-in derivative calculating functions. This function is called a tensor.backward(). It's the method of the tensors.

How to enable derivative:

  1. Create a tensor: abc = torch.tensor([1.5, 1.5, 1.5]) . For example, it created a rank one tensor.
  2. Enable derivative calculation in the tensor: abs.requires_grad_().
  3. Calculate loss. Then, calculate a derivative using .backward(). This function adds the .grad property to the tensor with the slope derivative.

Once the gradient value is available, we can iterate multiple times by adjusting parameters by the slope number.

This loop is called optimization, which means decreasing the loss value.

Example

Assume we have random dots on the graph for the c*x^2 + b*x + a equation. Let the function find the values for a, b, and c

We have multiple dots for each part, not one. Because if there was one dot, we could draw a line by them.

Initially, we picked some random numbers as the starting point. Then, we calculate the loss using mean_square_errorby passing random dots and our initial values.

Assume that the numbers are converted into tensors with enabled derivatives. Finally, we adjust the values until the gradient doesn't decrease sufficiently.

ReLu

Relu is a short name for Rectified Linear:

def rectified_linear(m, b, x):
  y = m * x + b
  return torch.clip(y, 0.)

This function is the single function whose negative value is turned into 0.

Combining multiple relationships creates a flexible function that can solve almost any problem. 

This is pretty much the foundation on which all neural networks are built.