Gradient Descent — I like to go downhill

It helps so much to reduce cost! Hehe! Read the blog to learn gradient Descent with step-by-step code and also learn to optimize learning rates, weights, biases, and much more. In the beginner's fashion in python and r.

Dear reader,

Here is another blog from the much forgotten #CodeMe series. Enjoy Reading.

We also in life have an urge to reduce the cost of several items in our life. It is part of money management, rather than personal finance management. In India, it is common for people to do bargaining in the markets, they buy groceries and fruits and daily supplies even sanitizer at low cost, they tell to lower the cost. Let’s say that 2 kg of potatoes cost 150 bucks, then the mind of a bargainer would quickly predict the amount to deduct and the market. The person would make the decision in which both the people will gain profit and sometimes people also do so to only help one side gain profit. Now you tell once that you will give the grocer 135 bucks then you again tell that okay I will give 120 bucks and the deal gets locked.

Now, the cost has reduced so how has it reduced, it has iterations, that is no. of times it will reduce the rate/cost, then the amount by which the cost will reduce or go down the slope is the learning rate, of the rate of deduction in the above case 15 bucks. So after deducting 15+15= 30 bucks the deal was closed because after that it would pass the minimum cost. This is done in the market as, Indian markets don’t have the thing called MRP, India does not charge Minimum Retail Price in markets, so people can tamper with the prices.

But see the beauty of Maths. It is a religion. Such simple equations, such functions, these graphs yet they demystify our minds and it’s working and help us to analyze and make predictive models that help us indeed. So to put it in the simplest fashion, math is simple but can do complex stuff. Don’t worry.

Let me put forward the algo and then we break it.

At first, take a look at this gigantic equation:-

Liked that? I did not. I will leave a link to my tweet so that you can read, why I hate the representation of maths and theories in general? Into a thread, just for you.

I can understand your fear of maths, computing those huge sums by hand is a monstrous process and we are lazy, who likes writing these when calculators are there to do it. And if it is too hard then a simple python code is enough for us for we are coders. This is also a reason to learn to code because it simplifies tasks. Here, I have seen that when a sum of an equation is broken down into simpler components then it becomes easier to analyze and solve it. We often hate the representation of maths in terms of X’s and functions and hard computations and infinitely large series of calculation series. Mathematicians are lazy too, they seem to optimize definition in the laziest fashion, but coders are lazier. Math is a simple and powerful tool because it can range from a multiplication table of 7(I did the most wrong in school) to how the mind works. We can in the coming years map the minds of the very first human and maybe prove whether God is there or not. Whether simple or complex, math surprises man in the simplest fashion. I have also seen people spending time on concepts and theories when we seem to forget it is nothing but the written and formulated version of practical knowledge. Just in the break of 3000 bc when some of the oldest civilizations were thriving and in a daze when they invented the first forms of writing to explain themselves from drawing elaborately to making shorthand notes that were easily understood. See laziness was the key to invention and necessity obviously because people who did not like to draw used to write and soon writing was preferred to drawing as it took less time to analyze the whole writing. I have seen optimization is put in a hard manner and at the end of the courses but to me, optimization was the most important concept, cause I was greedy for accuracy and cost reduction.

Now let’s start:-

The algorithm is something like this:-

  • Predict the outcomes using the sample m and b(here weights and biases, in the Cartesian plane, a line is defined as x multiplied to some number and added with a constant. The straight-line piercing at point of origin(0,0) has the formula — mx+b, where m=1, b =0. That how simple it is.)
  • After computing the outcome, it is real luck to get a proper Goldilocks weight and biases, you compute how much it is wrong. You can use many different cost functions. If it is a linear regression, I prefer MAE — MEAN SQUARED ERROR, you will subtract the real value from the predicted value and then square it so that it has no negative value, then sum it up and divide it by the number of observations, 1 backdrop, you need immensely big data to optimize properly with this cost function. (For a small dataset it takes less accuracy but it is still okay, and by huge data, I mean at least 200 data samples in the validation set.)
  • Then comes the most important part, why is it called gradient, you will find out. We take a gradient of our cost function, gradient means partial derivative, to be technical, and now I won’t bore you with definition, I will leave a link to understand it better. A Partial Derivative is a derivative where we hold some variables constant. Like if a function takes 2 variables, then to find the derivative of 1 variable we take the other as a constant. In detail — math is fun partial derivative and limits, evaluation of limit and derivatives.
  • Then I would welcome 2 new characters who I guess I introduced. Learning rate and iterations. Ya those 2 cool, hard-coded dudes out there, there needed to coded by hand or else be optimized using hard optimized function. Here is the link to my Kaggle notebook on optimization — link. Then you multiply with the partial derivative with the learning rate and subtract it from the current weights and biases.
  • Then you are all set, just tweak the learning rate and maybe the iterations and then you can optimize the model.
  • Go ahead, get set code.!


I will take you to step by step but not repeat the process please revert to the algorithm provided above and then maybe you can return to the code section. let’s make a function. Thank you.

Step 1:-

Step 2:-

step 3:-

And we are done, we just developed a gradient descent function for linear regression and using the MAE error, now you see how easy maths is, and how it can help solve complex works like working of the brain. So now here is the whole code, in r and python. you may ctrl+c and ctrl+v, I will not impose copyright.


#R:-gradientDesc <- function(x, y, learn_rate=0.01, conv_threshold=0.001, n=length(x),max_iter=5000) {
plot(x, y, col = "blue", pch = 20)
m <- runif(1, 0, 1)
c <- runif(1, 0, 1)
yhat <- m * x + c
MSE <- sum((y - yhat) ^ 2) / n
converged = F
iterations = 0
while(converged == F) {
## Implement the gradient descent algorithm
m_new <- m - learn_rate * ((1 / n) * (sum((yhat - y) * x)))
c_new <- c - learn_rate * ((1 / n) * (sum(yhat - y)))
m <- m_new
c <- c_new
yhat <- m * x + c
MSE_new <- sum((y - yhat) ^ 2) / n
if(MSE - MSE_new <= conv_threshold) {
abline(c, m)
converged = T
return(paste("Optimal intercept:", c, "Optimal slope:", m))
iterations = iterations + 1
if(iterations > max_iter) {
abline(c, m)
converged = T
return(paste("Optimal intercept:", c, "Optimal slope:", m))
# Run the function
x = c(1,2,5,7,8)
y = c(0,3,9,13,15)
gradientDesc(x, y, 0.0000293, 0.001, 5, 2500000)

Thank you for reading. Stay safe, take care. Happy learning.

Divyosmi Goswami: A digital nomad's journal wandering through the physical and cyber city discovering himself.