Role of Optimizers in Deep Learning?
Starting Note
This article aims to provide a simple, abstract view of optimizers for those who have some knowledge of deep learning but are seeking clarity on this critical concept. We won’t dive into the nitty-gritty details or limitations of optimizers here; instead, we’ll offer a gentle introduction.
What is Optimizer?
Optimizers also referred to as Optimization Techniques, are the methods/algorithms that help us find the best parameter values (weights and biases) for a neural network.
Need for Optimizer in Deep Learning
To understand the need for optimizers in Deep Learning, Let’s start with a simple example.
Let there be an equation as ax +b = 7 and the value of x is 3 what will be the values of a and b?
With simple intuition, we answer it easily. The values can be a = 2, b = 1. But there are other right values to solve this equation. The values can be a = 3, b = -2. There are infinite values that can be values of a and b to make this equation correct and can make it wrong as well.
How can we find the best values from the set of infinite possible values to make this equation right? In this example putting right values by intuition works fine.
But in deep learning, there are so many equations like this, that are way more complex and larger than this simple equation, and we have to find the right values (Weights and Biases) for it. There are so many values (weights and biases) that influence each other and the final outcome.
Here a very small neural network is shown, and it has many values (Weights) to pick to give the correct output.
Why Not choose values Randomly, till find the right one?
Here you can think of choosing these values (weights) randomly again and again. But this isn’t practical at all, as the combinations of these values are infinite and we cannot just wait to be lucky to get the best combination.
Optimizers Solving the Problem
This is an optimization problem. An optimization problem is the problem of finding a set of values that gives the best result. So, this can be solved by optimizers.
First, the values (weights and biases) are selected randomly. Then optimizers find the best values (weights and biases) by calculating the derivatives of equations for each layer of the neural network and then update the values in each iteration accordingly. After many iterations, the best set of values is found.
There are so many variations in optimizers, but the basic optimizer (optimization technique) is to calculate the derivatives and update values according to them.
So many optimizers have been created so far. Some famous are Gradient Descent, Stochastic Gradient Descent, Adam and Adagrad … etc.
The details and nitty-gritty of each optimizer will be discussed in upcoming articles.
Limitations of Optimizers
Optimizers have not solved this problem completely yet as they do not always find the best values (Weights and Biases) due to many reasons which can lead to bad answers.
All the limitations of the optimizers and challenges of the optimization problem will be described in detail in upcoming articles with a comprehensive detail of the working of optimizers.