Author: Arnaud Autef

<aside>
ðŸ’¡ In this reading group session, we present the 2021 Deep Learning (DL) paper **"Regularization is all you Need: Simple Neural Nets can Excel on Tabular Data"**.

In this empirical paper, authors argue that **simple neural networks architectures can reach state-of-the-art performance for supervised learning on tabular data**.

It goes **against the common wisdom that Gradient Boosting with Decision Trees (GBDT) are superior to DL** approaches.

The main insight of the paper is that **regularization techniques are key to unlock higher performance with neural networks**. As long as a broad array of regularization approaches are considered during hyperparameters optimization, neural networks should prevail.

In this session we,

- Review some DL basics relevant to the paper
- Review some basics about regularization in supervised learning and regularization techniques for Deep Learning.
- Discuss empirical results presented in the paper and derive conclusions for practicioners.

</aside>

**Contents**

# 1 - What is Deep Learning?

https://twitter.com/ylecun/status/1209497021398343680?lang=fr

Unfortunately, and as we can read above, **DL is not quite precisely defined,** and we are not going to try and define it!

Here, we restrict ourselves to **Multi-Layer-Perceptrons (MLPs)**

- They are the "simplest" of DL models, almost always covered first in introductory DL lectures.
- This is the neural network architecture used in the paper discussed today.

## What is an MLP?

### Setup

We consider a supervised regression setting with dataset $\mathcal{D} = (x_i, y_i)_{1 \le i \le n}$ where

- $x_i \in \mathbb{R}^d$ input features of dimension $d$
- $y_i \in \mathbb{R}$ responses to model