Author: Arnaud Autef

Contents

Session 1/2

<aside> 💡 In this reading group session: we review the LassoNet method, from its motivation to its theoretical justification.

In the next session: we review experimental results with the LassoNet and discuss potential applications and extensions of this algorithm

</aside>

The LassoNet paper by Ismael Lemhadri, Feng Ruan, Louis Abraham and Rob Tibshirani is to be published in JMLR, the paper website is available at https://lassonet.ml, most images throughout this presentation are taken from the paper.

Setup

<aside> 💡 Task: Find a minimal set of features $k \subset [d]$ to model $y$

Problem: Relationship between the response $y$ and input variables $x_i \in \mathbb{R}^d$ is non linear, but well-modelled by a neural network function approximator.

Goal: Can we efficiently select variables $x_i$ when the mapping from $x_i$ to $y_i$ is a neural network?

</aside>

Approach

The LassoNet procedures that the authors propose can be broken down into 3 steps:

  1. Augment the neural network model space with a skip connection

  2. Define a sparsity inducing loss function to select variables in those models

    $$\min_{\theta,W} \quad L(\theta,~W) + \lambda ||\theta||1\\ \text{ }\\ \text{s.t}~\quad \forall 1 \le j \le d,\quad ||W_j^{(1)}||{\infty} \le M |\theta_j|$$

    With,

    $$L(\theta,~W) = \frac{1}{n} \sum_{1 \le i \le n} l(f_{\theta,W}(x_i), y_i)$$

    Discussion

    1. Penalize skip connection weights $\theta$ with a Lasso-like L1 penalty

      → Only a subset $k \subset [d]$ of input variables will have a non-zero weight $\theta_j$ in the skip connection, for high enough $\lambda$.