Sunday, September 28, 2014

Regularization and Beyesian Interpretation

For a linear model y = Xb + e.

1. Minimizing MSE || y - Xb ||^2 is equivalent to maximum likelihood with no distribution on beta, normal distribution on e. Since normal distribution is just exp(-0.5 * x^2). That x^2 is equivalent to ||y - Xb||^2.
 
2. Minimizing Mean Absolute Deviation |y-Xb| is equivalent to maximum likelihood with no distribution on beta, Laplace distribution on e. Since Laplace distribution is just exp(-|x|).  

3. Minimizing MSE with L2 regularization is equivalent to maximum likelihood with normal distribution on both b and e. Since MSE ~ normal e, and L2 ~ normal b. 

4. Minimizing MSE with L1 regularization is equivalent to maximum likelihood with normal distribution on e, and Laplace distribution on b.

5. Minimizing MSE with L0 loss is equivalent to maximum likelihood with normal distribution e, b has a distribution of something like super long tail, super center at zero???

6. Maximum Likelihood with normal e, Cauchy prior for b is kind of like minimizing MSE with L0.00001 on beta. The L0.000001 is to denote that instead of regularization |b| for L1, |b|^2 for L2, it is log(|b|+1)

No comments:

Post a Comment