ML4LM- How does Lasso bring sparsity?

2 minute read

Published: December 25, 2023

Many of us have heard about Lasso and its ability to bring sparsity to models, but not everyone understands the nitty-gritty of how it actually works. In a nutshell, Lasso is like a superhero for overfitting problems, tackling them through a technique called regularization. If you’re not familiar with regularization and how it fights overfitting, I’d recommend checking that out first. For now, let’s dive into the magic of how Lasso brings sparsity.

Lasso’s cost function helps find the best model by balancing two goals: making accurate predictions (mean squared error) and keeping the number of factors in check (absolute values of coefficients). It does this by adding a penalty that encourages simpler, more focused models.

formulae

Objective of Lasso

The objective of Lasso is to find the values of ( \beta ) that minimize this combined loss function, striking a balance between fitting the data well (MSE) and keeping the magnitude of the coefficients small (( \left

\beta_i\right

)). This helps in preventing overfitting and encourages sparsity in the model by pushing some coefficients to exactly zero, effectively performing variable selection.

How Lasso Brings Sparsity

To understand how Lasso brings sparsity, let’s consider a simple example with two coefficients ( \beta_1 ) and ( \beta_2 ) in a 2D plane. We have the equation:

[ \left

\beta_1\right

+ \left

\beta_2\right

\leq t ]

formulae

Think of it as trying to make these coefficients as small as possible, but there’s a twist. We introduce a constraint that puts a limit on the total absolute values of these coefficients. It’s like saying, ‘Hey, keep the sum of these coefficients below a certain threshold.’ This threshold is denoted as ‘t,’ and its size matters - a larger ‘t’ allows for larger coefficients, and it’s inversely proportional to ( \lambda ).

Now, let’s extend this to three dimensions with:

[ \left

\beta_1\right

+ \left

\beta_2\right

+ \left

\beta_3\right

\leq t ]

As the number of dimensions increases, the constraint creates a more pointy shape. On the 3D plane, this shape becomes sharper, and some coefficients are pushed to zero. For instance, a point on the Z-axis might result in ( \beta_1 ) and ( \beta_2 ) becoming zero, and so on.

formulae

Visualizing the loss function (MSE) graph, it becomes apparent that the Lasso constraint, with its sharp corners, is more likely to intersect the ellipses representing the Residual Sum of Squares at points where some coefficients become zero. This unique constraint shape of Lasso, with its sharp corners, effectively pushes certain coefficients to zero, introducing simplicity to the model that other techniques might not achieve.

formulae

Summary

In a nutshell, Lasso’s unique constraint shape, with its sharp corners, makes it great at pushing some coefficients to exactly zero, introducing a simplicity to the model that other techniques might not achieve.

Share on

Twitter Facebook LinkedIn

Hoyath Ali

ML4LM- How does Lasso bring sparsity?

Objective of Lasso

How Lasso Brings Sparsity

Summary

Share on

You May Also Enjoy

ML4LM — Speculative Decoding — from where we left off

ML4LM — Speculative Decoding — from where we left off

ML4LM — Profiling torch.compile on DenseNet-121 Inference (GTX 1650) [medium]

Introduction

ML4LM — Guards vs Graph Breaks in PyTorch: What You Need to Know [medium]

Guards vs Graph Breaks in PyTorch torch.compile