Session 1/2

In this reading group session: we review the LassoNet method, from its motivation to its theoretical justification.

In the next session: we review experimental results with the LassoNet and discuss potential applications and extensions of this algorithm

The LassoNet paper by Ismael Lemhadri, Feng Ruan, Louis Abraham and Rob Tibshirani is to be published in JMLR, the paper website is available at, most images throughout this presentation are taken from the paper.


Task: Find a minimal set of features $k \subset [d]$ to model $y$

Problem: Relationship between the response $y$ and input variables $x_i \in \mathbb{R}^d$ is non linear, but well-modelled by a neural network function approximator.

Goal: Can we efficiently select variables $x_i$ when the mapping from $x_i$ to $y_i$ is a neural network?


The LassoNet procedures that the authors propose can be broken down into 3 steps:

  1. Augment the neural network model space with a skip connection

  2. Define a sparsity inducing loss function to select variables in those models

    $$\min_{\theta,W} \quad L(\theta,~W) + \lambda ||\theta||1\\ \text{ }\\ \text{s.t}~\quad \forall 1 \le j \le d,\quad ||W_j^{(1)}||{\infty} \le M |\theta_j|$$


    $$L(\theta,~W) = \frac{1}{n} \sum_{1 \le i \le n} l(f_{\theta,W}(x_i), y_i)$$


    1. Penalize skip connection weights $\theta$ with a Lasso-like L1 penalty

      → Only a subset $k \subset [d]$ of input variables will have a non-zero weight $\theta_j$ in the skip connection, for high enough $\lambda$.