Notes on "Practical Deep Learning for Coders"

31 October 2023

Course: https://course.fast.ai/

Book: https://github.com/fastai/fastbook

Intro

Deep Learning is a computer technique to extract and transform data using multiple neural network layers.

Top used use cases:

  1. NLP: answer questions; speech recognition; summarize; classify; find; search
  2. Computer Vision: satellite image interpretation, face recognition, caption
  3. Medicine: finding anomalies; counting features in slides; measuring; diagnosing
  4. Biology: classifying genomic tasks; analyzing cell interactions
  5. Image Generation
  6. Recommend Systems: search engines, home page layout, product recommendation
  7. Playing Games
  8. Robotics
  9. Financial, Logistics, etc...

In 1943, Warren McCulloch and Walter Pitts developed a mathematical model of artificial neurons. It's declared in "A logical calculus of the ideas immanent in Neurous Activity."

The neuron could be represented as addition and thresholding.

Mathematical model of artificial neuron | Download Scientific Diagram

Frank Rosenblatt gave artificial neurons the ability to learn. He even created the first device on this principle: Mark I Perceptron.

We are now about to witness the birth of such a machine - capable of perceiving, recognizing, and identifying the surroundings without any human training or context.

"The Design of an Intelligent Automata"

Frank Rosenblatt

Marvin Minsky and Seymour Papert wrote a book called "Perceptrons." They showed that the device needs multiple layers to learn simple mathematical functions.

Perhaps the most pivotal work in neural networks was the "Parallel Distributed Processes" by David Rumelhart, James McCallen, and the PDP Research Group, released in 1986.

PDP approach for the programming framework defined requires:

  1. process unit sets
  2. state of activation
  3. output function for each unit
  4. connectivity pattern among units
  5. propagation rule for propagating through the network of connectivities
  6. activation rule for combining inputs impinging on a unit with the current state of that unit to produce output for the unit
  7. learning rule whereby patterns of connectivity are modified by experience
  8. an environment where PDP must operate

Modern Neural Networks use an approach similar to PDP. 

The book says that the theoretical knowledge was misunderstood, and the issues held back the development until 2010. But also says that despite having two layers, it was too big and too slow to be helpful. Recent advancements in the hardware allowed training neural networks with more layers.

For learning, it is best to use Jupyter Notebooks. The Jupyter Notebooks are software that allows you to include assets and code together.

The deep learning uses the neural networks. It's part of machine learning. The deep learning model process and its training apply to the general concepts of machine learning.

Machine learning is another form of programming where instead of the algorithm, you write the desired output, and the software finds the path.

The term "machine learning" was coined by Arthur Samuel from IBM in 1949. In his 1962 classic essay "Artificial Intelligence: A Frontier of Automation," he wrote machine technique.

Rather than describing each step to solve the task, show an example of the problem to solve. Then, let the program figure out the algorithm.


There is an actual performance of weight assignment. Then, there is an automatic means of testing performance—a mechanism for improving performance by changing weight assignments.

The weights are variables. The weight assignments are a choice of those variables. The program inputs are variables. 

In modern terminology, machine learning programs are called models.

The weight assignment is the second input describing the actual input. In modern terminology, weight assignments are called model parameters.

Automatic testing means you let two models work against each other. Actual performance is the desired result. After each iteration, you adjust the model to test the outcome.

If you automate the iteration to adjust the model parameters, the program could learn from experience.

This automatic iteration is called model training.

Once the weights were adjusted, they became part of the model. 

The neural networks are mathematical functions. The mathematical proof called the Universal Approximation Theorem shows this function can solve any problem in theory.

The general approach to adjusting weights is made via Stochastic Gradient Descent (SGD), a mathematical method.

The model includes weight, a function producing results, and performance updating weights. The functional part is called architecture.

Prediction is the result. The loss is the measurement of the performance. It's measured against the labels.

Transformers are code that is applied during the training. There are item and batch transformers. The inputs are not used to train. Some part of the set is kept separated to test that training.

The training input is called a training set, while testing data is used as a validation set. Over time, as the validation set is set, the model starts memorizing the labels. Thus, it doesn't find the generalizing patterns. This is called overfitting.

Overfitting is the single, most important issue in the model training.

Resnet is one of the architectures. It could have 18, 34, 50, 101 and 150 layers. The more layers mean training and with less data will be prone to overfitting.

The model that has trained weights is called pre-trained model. It's better to build on the pre-trained models by changing some layers with yours. Using pre-trained model for a task different than what it was originally trained is called transfer learning.

Fine tuning is a transfer learning technique where parameters of pretrained models update by additional epoch. Epoch is how many times the model looks at the data.

Machine learning has some limitations:

  1. A model can not be created without data
  2. It can learn the patterns from input data

Deep Learning Vocabulary

Label - a data we are trying to predict

Architecture - a template of a model we are tring to fit a model; an actual function

Parameters - values in model that change what task it can do and updates throughout the training process.

Fit - update the parameters such that predictions of model using input to match output labels. Also called Train.

Fine Tune - special technique in transfer learning to update pre-trained model.

Epoch - one complete task to pass through input data

Loss - a measure how good model is using validation set

Training set - data for training

Validation set - data for measuring loss

Overfitting - training remembers specific features of input than generalizing pattern.

CNN - neural network for computer vision