If MACHINES are DUMB then how do they LEARN….
Alright, everyone, let's get ready! We are going to explore the world of machine learning (ML). But first things first, without optimization techniques, machine learning is like a car without an engine - technically a car, but it won't take you anywhere. Let's dive into why optimization is crucial for machine learning and explore the techniques that drive those ML engines.
What Is Machine Learning? (No, It’s Not Magic or maybe it is…..)
First, let’s clear the air. Machine Learning sounds like some Harry Potter stuff, right? Machines that learn. But before you start imagining robots taking over the world, let me reel you back to reality. At its core, ML is just another computer program. Yep, you heard me—just another program.
“But wait,” you might say, “isn’t it supposed to be super smart?” Yes, it’s smart, but not because it magically understands things. It’s smart because it’s optimised. It repeatedly learns from its mistakes which are being generated with the help of optimisation techniques and pre-existing datasets.
Optimisation: The Unsung Hero of Machine Learning
So, what’s this optimization thing I keep yapping about? Optimisation is the process of tweaking the ML model so that it learns better and faster. Without it, our enthusiastic student (ML) is just going to keep messing up their homework. You don’t want that, and neither do we.
Here’s why optimization is essential:
1. Efficiency: Without optimization, your ML model is like trying to find a needle in a haystack with your eyes closed. It’s going to take forever. Optimization opens your eyes. 2. Accuracy: You want your model to not just learn, but to learn well. Optimization helps in fine-tuning the learning process, so your model doesn’t end up predicting that penguins can fly (they can’t, FYI).
3. Speed: Time is money, my friend. A non-optimized ML model will take ages to train. Optimization is like giving it a turbo boost.
The Various Optimization Techniques (or How to Make ML Less Dumb)
Now, let’s get to the juicy part—how do we optimize? There are several techniques, and while they all sound a bit like something out of a sci-fi movie, they’re actually pretty cool.
1. Gradient Descent:
Gradient Descent is the bread and butter of optimization. Imagine you’re hiking down a hill and want to reach the bottom (which represents the lowest error in your model). Gradient Descent is
like checking your slope at every step and making sure you’re going downhill. If you go too fast, you might trip (overfitting), and if you go too slow, well, you’re just wasting time.
There’s also Stochastic Gradient Descent (SGD) where you’re like, “Forget the whole hill, I’m just gonna run down based on where I think the next step should be.” It’s faster but a bit riskier. You might take a wrong step now and then, but overall, you get to the bottom quicker.
2. Momentum:
Momentum is a method that helps accelerate min-batch gradient descent in the relevant direction and dampens oscillations. It does this by adding a fraction γ of the update vector of the past time step to the current update vector.
Essentially, when using momentum, we push a ball down a hill. The ball accumulates momentum as it rolls downhill, becoming faster and faster on the way (until it reaches its terminal velocity if there is air resistance, i.e. γ <1). The same thing happens to our parameter updates: The momentum term increases for dimensions whose gradients point in the same directions and reduces updates for dimensions whose gradients change directions. As a result, we gain faster convergence and reduced oscillation.
3. Adam Optimizer: The All-Rounder
The Adam optimizer is like Gradient Descent’s cooler, smarter cousin who’s been hitting the gym. It’s a combination of momentum and adaptive learning rate techniques. Imagine it as Gradient Descent with a brain that learns from its mistakes and doesn’t keep tripping over the same rocks. It’s efficient, fast, and reliable. That’s why it’s a fan favorite in the ML community.
Adam is a first-order-gradient-based algorithm of stochastic objective functions, based on adaptive estimates of lower-order moments. Adam is one of the latest state-of-the-art optimization algorithms being used by many practitioners of machine learning. The first moment normalized by the second moment gives the direction of the update.
4. RMSprop: The Adaptive One
RMSprop (Root Mean Square Propagation) is a bit like a self-aware car that adjusts its speed based on the terrain. It’s an optimization technique that adapts the learning rate depending on how rough or smooth the landscape (error function) is. It makes sure your model isn’t speeding on a rough patch or crawling on a smooth one.
RMSProp is a gradient-based optimization technique used in training neural networks and was developed as a stochastic technique for mini-batch learning. Gradients of very
complex functions like neural networks have a tendency to either vanish or explode as the data propagates through the function.
RMSProp deals with the above issue by using a moving average of squared gradients to normalize the gradient. This normalization balances the step size (momentum), decreasing the step for large gradients to avoid exploding and increasing the step for small gradients to avoid vanishing.
Simply put, RMSProp uses an adaptive learning rate instead of treating the learning rate as a hyperparameter. This means that the learning rate changes over time.
5. Learning Rate Scheduling: Speed Control
Speaking of speed, learning rate scheduling is like a smart cruise control for your ML model. You start fast, but as you get closer to the goal (minimum error), you slow down. This prevents your model from overshooting the minimum and also helps it fine-tune the last steps.
The learning rate in machine learning is a hyperparameter which determines to what extent newly acquired information overrides old information. It is the most important hyperparameter to tune for training deep neural networks.
The learning rate is crucial because it controls both the speed of convergence and the ultimate performance of the network.
Why You Should Care About Optimization (No, Really)
Okay, so why does all this matter? Well, if you’re building a machine learning model and you want it to, you know, actually work, you need optimization. Without it, your model is just a blob of untrained neural networks—a pretty useless computer program.
Optimization makes sure that your ML model doesn’t just learn, but learns well, learns fast, and gives you accurate predictions.
In Conclusion: Optimization is the Key
So, the next time someone tells you how amazing machine learning is, remember that it’s not magic. It’s a computer program—one that’s optimized to the hilt. Without those fancy optimization techniques like Gradient Descent, Momentum, and Adam, ML would just be another program trying (and failing) to figure out why cats always land on their feet.
Remember, optimization is what turns a dumb program into a smart one. It’s the difference between a machine that just exists and one that actually learns. And that, my friends, is what makes machine learning more than just another line of code.
References:
1. Jamal – summarised in a more detailed manner.
2. Andrew Ng is the next god – refer to this and watch some videos for detailed explanations
3. caution--- not for beginners– contains some mathematical formulas of probability and statistics that might get over your head.