Commit 4ced2e8a authored by Carsten Eie Frigaard's avatar Carsten Eie Frigaard
Browse files

updated_notebooks_for_l09

parent f14d2eb2
%% Cell type:markdown id: tags:
# SWMAL Exercise
## Hyperparameters and Gridsearch
When instantiating a Scikit-learn model in python most or all constructor parameters have _default_ values. These values are not part of the internal model and are hence called ___hyperparameters___---in contrast to _normal_ model parameters, for example the neuron weights, $\mathbf w$, for an `MLP` model.
### Manual Tuning Hyperparameters
Below is an example of the python constructor for the support-vector classifier `sklearn.svm.SVC`, with say the `kernel` hyperparameter having the default value `'rbf'`. If you should choose, what would you set it to other than `'rbf'`?
```python
class sklearn.svm.SVC(
C=1.0,
kernel=rbf,
degree=3,
gamma=auto_deprecated,
coef0=0.0,
shrinking=True,
probability=False,
tol=0.001,
cache_size=200,
class_weight=None,
verbose=False,
max_iter=-1,
decision_function_shape=ovr,
random_state=None
)
```
The default values might be a sensible general starting point, but for your data, you might want to optimize the hyperparameters to yield a better result.
To be able to set `kernel` to a sensible value you need to go into the documentation for the `SVC` and understand what the kernel parameter represents, and what values it can be set to, and you need to understand the consequences of setting `kernel` to something different than the default...and the story repeats for every other hyperparameter!
### Brute Force Search
An alternative to this structured, but time-consuming approach, is just to __brute-force__ a search of interesting hyperparameters, and choose the 'best' parameters according to a fit-predict and some score, say 'f1'.
<img src="https://itundervisning.ase.au.dk/SWMAL/L10/Figs/gridsearch.png" alt="WARNING: could not get image from server." style="width:350px">
<img src="https://itundervisning.ase.au.dk/SWMAL/L09/Figs/gridsearch.png" alt="WARNING: could not get image from server." style="width:350px">
<small><em>
<center> Conceptual graphical view of grid search for two distinct hyperparameters. </center>
<center> Notice that you would normally search hyperparameters like `alpha` with an exponential range, say [0.01, 0.1, 1, 10] or similar.</center>
</em></small>
Now, you just pick out some hyperparameters, that you figure are important, set them to a suitable range, say
```python
'kernel':('linear', 'rbf'),
'C':[1, 10]
```
and fire up a full (grid) search on this hyperparameter set, that will try out all your specified combination of `kernel` and `C` for the model, and then prints the hyperparameter set with the highest score...
The demo code below sets up some of our well known 'hello-world' data and then run a _grid search_ on a particular model, here a _support-vector classifier_ (SVC)
Other models and datasets ('mnist', 'iris', 'moon') can also be examined.
### Qa Explain GridSearchCV
There are two code cells below: 1) function setup, 2) the actual grid-search.
Review the code cells and write a __short__ summary. Mainly focus on __cell 2__, but dig into cell 1 if you find it interesting (notice the use of local-function, a nifty feature in python).
In detail, examine the lines:
```python
grid_tuned = GridSearchCV(model, tuning_parameters, ..
grid_tuned.fit(X_train, y_train)
..
FullReport(grid_tuned , X_test, y_test, time_gridsearch)
```
and write a short description of how the `GridSeachCV` works: explain how the search parameter set is created and the overall search mechanism is functioning (without going into too much detail).
What role does the parameter `scoring='f1_micro'` play in the `GridSearchCV`, and what does `n_jobs=-1` mean?
%% Cell type:code id: tags:
``` python
# TODO: Qa, code review..cell 1) function setup
from time import time
import numpy as np
from sklearn import svm
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split
from sklearn.metrics import classification_report, f1_score
from sklearn import datasets
from libitmal import dataloaders as itmaldataloaders # Needed for load of iris, moon and mnist
currmode="N/A" # GLOBAL var!
def SearchReport(model):
def GetBestModelCTOR(model, best_params):
def GetParams(best_params):
ret_str=""
for key in sorted(best_params):
value = best_params[key]
temp_str = "'" if str(type(value))=="<class 'str'>" else ""
if len(ret_str)>0:
ret_str += ','
ret_str += f'{key}={temp_str}{value}{temp_str}'
return ret_str
try:
param_str = GetParams(best_params)
return type(model).__name__ + '(' + param_str + ')'
except:
return "N/A(1)"
print("\nBest model set found on train set:")
print()
print(f"\tbest parameters={model.best_params_}")
print(f"\tbest '{model.scoring}' score={model.best_score_}")
print(f"\tbest index={model.best_index_}")
print()
print(f"Best estimator CTOR:")
print(f"\t{model.best_estimator_}")
print()
try:
print(f"Grid scores ('{model.scoring}') on development set:")
means = model.cv_results_['mean_test_score']
stds = model.cv_results_['std_test_score']
i=0
for mean, std, params in zip(means, stds, model.cv_results_['params']):
print("\t[%2d]: %0.3f (+/-%0.03f) for %r" % (i, mean, std * 2, params))
i += 1
except:
print("WARNING: the random search do not provide means/stds")
global currmode
assert "f1_micro"==str(model.scoring), f"come on, we need to fix the scoring to be able to compare model-fits! Your scoreing={str(model.scoring)}...remember to add scoring='f1_micro' to the search"
return f"best: dat={currmode}, score={model.best_score_:0.5f}, model={GetBestModelCTOR(model.estimator,model.best_params_)}", model.best_estimator_
def ClassificationReport(model, X_test, y_test, target_names=None):
assert X_test.shape[0]==y_test.shape[0]
print("\nDetailed classification report:")
print("\tThe model is trained on the full development set.")
print("\tThe scores are computed on the full evaluation set.")
print()
y_true, y_pred = y_test, model.predict(X_test)
print(classification_report(y_true, y_pred, target_names))
print()
def FullReport(model, X_test, y_test, t):
print(f"SEARCH TIME: {t:0.2f} sec")
beststr, bestmodel = SearchReport(model)
ClassificationReport(model, X_test, y_test)
print(f"CTOR for best model: {bestmodel}\n")
print(f"{beststr}\n")
return beststr, bestmodel
def LoadAndSetupData(mode, test_size=0.3):
assert test_size>=0.0 and test_size<=1.0
def ShapeToString(Z):
n = Z.ndim
s = "("
for i in range(n):
s += f"{Z.shape[i]:5d}"
if i+1!=n:
s += ";"
return s+")"
global currmode
currmode=mode
print(f"DATA: {currmode}..")
if mode=='moon':
X, y = itmaldataloaders.MOON_GetDataSet(n_samples=5000, noise=0.2)
itmaldataloaders.MOON_Plot(X, y)
elif mode=='mnist':
X, y = itmaldataloaders.MNIST_GetDataSet(load_mode=0)
if X.ndim==3:
X=np.reshape(X, (X.shape[0], -1))
elif mode=='iris':
X, y = itmaldataloaders.IRIS_GetDataSet()
else:
raise ValueError(f"could not load data for that particular mode='{mode}', only 'moon'/'mnist'/'iris' supported")
print(f' org. data: X.shape ={ShapeToString(X)}, y.shape ={ShapeToString(y)}')
assert X.ndim==2
assert X.shape[0]==y.shape[0]
assert y.ndim==1 or (y.ndim==2 and y.shape[1]==0)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=test_size, random_state=0, shuffle=True
)
print(f' train data: X_train.shape={ShapeToString(X_train)}, y_train.shape={ShapeToString(y_train)}')
print(f' test data: X_test.shape ={ShapeToString(X_test)}, y_test.shape ={ShapeToString(y_test)}')
print()
return X_train, X_test, y_train, y_test
print('OK(function setup, hope MNIST loads works, seem best if you got Keras or Tensorflow installed!)')
```
%% Cell type:code id: tags:
``` python
# TODO: Qa, code review..cell 2) the actual grid-search
# Setup data
X_train, X_test, y_train, y_test = LoadAndSetupData(
'iris') # 'iris', 'moon', or 'mnist'
# Setup search parameters
model = svm.SVC(
gamma=0.001
) # NOTE: gamma="scale" does not work in older Scikit-learn frameworks,
# FIX: replace with model = svm.SVC(gamma=0.001)
tuning_parameters = {
'kernel': ('linear', 'rbf'),
'C': [0.1, 1, 10]
}
CV = 5
VERBOSE = 0
# Run GridSearchCV for the model
start = time()
grid_tuned = GridSearchCV(model,
tuning_parameters,
cv=CV,
scoring='f1_micro',
verbose=VERBOSE,
n_jobs=-1)
grid_tuned.fit(X_train, y_train)
t = time() - start
# Report result
b0, m0 = FullReport(grid_tuned, X_test, y_test, t)
print('OK(grid-search)')
```
%% Cell type:markdown id: tags:
### Qb Hyperparameter Grid Search using an SDG classifier
Now, replace the `svm.SVC` model with an `SGDClassifier` and a suitable set of the hyperparameters for that model.
You need at least four or five different hyperparameters from the `SGDClassifier` in the search-space before it begins to take considerable compute time doing the full grid search.
So, repeat the search with the `SGDClassifier`, and be sure to add enough hyperparameters to the grid-search, such that the search takes a considerable time to run, that is a couple of minutes or up to some hours..
%% Cell type:code id: tags:
``` python
# TODO: grid search
assert False, "TODO: make a grid search on the SDG classifier.."
```
%% Cell type:markdown id: tags:
### Qc Hyperparameter Random Search using an SDG classifier
Now, add code to run a `RandomizedSearchCV` instead.
<img src="https://itundervisning.ase.au.dk/SWMAL/L10/Figs/randomsearch.png" alt="WARNING: could not get image from server." style="width:350px" >
<img src="https://itundervisning.ase.au.dk/SWMAL/L09/Figs/randomsearch.png" alt="WARNING: could not get image from server." style="width:350px" >
<small><em>
<center> Conceptual graphical view of randomized search for two distinct hyperparameters. </center>
</em></small>
Use these default parameters for the random search, similar to the default parameters for the grid search
```python
random_tuned = RandomizedSearchCV(
model,
tuning_parameters,
n_iter=20,
random_state=42,
cv=CV,
scoring='f1_micro',
verbose=VERBOSE,
n_jobs=-1
)
```
but with the two new parameters, `n_iter` and `random_state` added. Since the search-type is now random, the `random_state` gives sense, but essential to random search is the new `n_tier` parameter.
So: investigate the `n_iter` parameter...in code and write a conceptual explanation in text.
Comparison of time (seconds) to complete `GridSearch` versus `RandomizedSearchCV`, does not necessarily give any sense, if your grid search completes in a few seconds (as for the iris tiny-data). You need a search that runs for minutes, hours, or days.
But you could compare the best-tuned parameter set and best scoring for the two methods. Is the random search best model close to the grid search?
%% Cell type:code id: tags:
``` python
# TODO:
assert False, "implement a random search for the SGD classifier.."
```
%% Cell type:markdown id: tags:
## Qd MNIST Search Quest II
Finally, a search-quest competition: __who can find the best model+hyperparameters for the MNIST dataset?__
You change to the MNIST data by calling `LoadAndSetupData('mnist')`, and this is a completely other ball-game that the iris _tiny-data_: it's much larger (but still far from _big-data_)!
* You might opt for the exhaustive grid search, or use the faster but-less optimal random search...your choice.
* You are free to pick any classifier in Scikit-learn, even algorithms we have not discussed yet---__except Neural Networks and KNeighborsClassifier!__.
* Keep the score function at `f1_micro`, otherwise, we will be comparing 'æbler og pærer'.
* And, you may also want to scale your input data for some models to perform better.
* __REMEMBER__, DO NOT USE any Neural Network models. This also means not to use any `Keras` or `Tensorflow` models...since they outperform most other models, and there are also too many examples on the internet to cut-and-paste from!
Check your result by printing the first _return_ value from `FullReport()`
```python
b1, m1 = FullReport(random_tuned , X_test, y_test, time_randomsearch)
print(b1)
```
that will display a result like
```
best: dat=mnist, score=0.90780, model=SGDClassifier(alpha=1.0,eta0=0.0001,learning_rate='invscaling')
```
and paste your currently best model into the message box, for ITMAL group 09 like
```
Grp09: best: dat=mnist, score=0.90780, model=SGDClassifier(alpha=1.0,eta0=0.0001,learning_rate='invscaling')
Grp09: CTOR for best model: SGDClassifier(alpha=1.0, average=False, class_weight=None, early_stopping=False,
epsilon=0.1, eta0=0.0001, fit_intercept=True, l1_ratio=0.15,
learning_rate='invscaling', loss='hinge', max_iter=1000,
n_iter_no_change=5, n_jobs=None, penalty='l2', power_t=0.5,
random_state=None, shuffle=True, tol=0.001,
validation_fraction=0.1, verbose=0, warm_start=False)
```
on Brightspace: "L10: Regularisering, optimering og søgning" | "Qd MNIST Search Quest II"
> https://brightspace.au.dk/d2l/le/lessons/27524/topics/674336
> https://brightspace.au.dk/d2l/le/lessons/53939/topics/791969
and, check if your score (for MNIST) is better than the currently best score. Republish if you get a better score than your own previously best.
Remember to provide an ITMAL group name manually, so we can identify a winner: the 1. st price is cake!
For the journal hand-in, report your progress in scoring choosing different models, hyperparameters to search and how you might need to preprocess your data...and note, that the journal will not be accepted unless it contains information about Your results published on the Brightspace 'Search Quest II' page!
%% Cell type:code id: tags:
``` python
# TODO:(in code and text..)
assert False, "participate in the Search Quest---remember to publish your result(s) on Brightspace."
```
%% Cell type:markdown id: tags:
REVISIONS||
---------||
2018-03-01| CEF, initial.
2018-03-05| CEF, updated.
2018-03-06| CEF, updated and spell checked.
2018-03-06| CEF, major overhaul of functions.
2018-03-06| CEF, fixed problem with MNIST load and Keras.
2018-03-07| CEF, modified report functions and changed Qc+d.
2018-03-11| CEF, updated Qd.
2018-03-12| CEF, added grid and random search figs and added bullets to Qd.
2018-03-13| CEF, fixed SVC and gamma issue, and changed dataload to be in fetchmode (non-keras).
2019-10-15| CEF, updated for ITMAL E19
2019-10-19| CEF, minor text update.
2019-10-23| CEF, changed demo model i Qd) from MLPClassifier to SVC.
2020-03-14| CEF, updated to ITMAL F20.
2020-10-20| CEF, updated to ITMAL E20.
2020-10-27| CEF, type fixes and minor update.
2020-10-28| CEF, added extra journal hand-in specs for Search Quest II, Qd.
2020-10-30| CEF, added non-use of KNeighborsClassifier to Search Quest II, Qd.
2020-11-19| CEF, changed load_mode=2 (Keras) to load_mode=0 (auto) for MNIST loader.
2021-03-17| CEF, updated to ITMAL F21.
2021-10-31| CEF, updated to ITMAL E21.
2021-11-05| CEF, removed iid=True paramter from GridSearchCV(), not present in current version of Scikit-learn (0.24.1).
2022-03-31| CEF, updated to SWMAL F22.
......
%% Cell type:markdown id: tags:
# SWMAL Exercise
## Regulizers
### Resume of The Linear Regressor
For our data set $\mathbf{X}$ and target $\mathbf{y}$
$$
\newcommand\rem[1]{}
\rem{SWMAL: CEF def and LaTeX commands, remember: no newlines in defs}
\newcommand\eq[2]{#1 &=& #2\\}
\newcommand\ar[2]{\begin{array}{#1}#2\end{array}}
\newcommand\ac[2]{\left[\ar{#1}{#2}\right]}
\newcommand\st[1]{_{\mbox{\scriptsize #1}}}
\newcommand\norm[1]{{\cal L}_{#1}}
\newcommand\obs[2]{#1_{\mbox{\scriptsize obs}}^{\left(#2\right)}}
\newcommand\diff[1]{\mbox{d}#1}
\newcommand\pown[1]{^{(#1)}}
\def\pownn{\pown{n}}
\def\powni{\pown{i}}
\def\powtest{\pown{\mbox{\scriptsize test}}}
\def\powtrain{\pown{\mbox{\scriptsize train}}}
\def\pred{_{\scriptsize\mbox{pred}}}
\def\bM{\mathbf{M}}
\def\bX{\mathbf{X}}
\def\bZ{\mathbf{Z}}
\def\bw{\mathbf{m}}
\def\bx{\mathbf{x}}
\def\by{\mathbf{y}}
\def\bz{\mathbf{z}}
\def\bw{\mathbf{w}}
\def\btheta{{\boldsymbol\theta}}
\def\bSigma{{\boldsymbol\Sigma}}
\def\half{\frac{1}{2}}
\newcommand\pfrac[2]{\frac{\partial~#1}{\partial~#2}}
\newcommand\dfrac[2]{\frac{\mbox{d}~#1}{\mbox{d}#2}}
\bX =
\ac{cccc}{
x_1\pown{1} & x_2\pown{1} & \cdots & x_d\pown{1} \\
x_1\pown{2} & x_2\pown{2} & \cdots & x_d\pown{2}\\
\vdots & & & \vdots \\
x_1\pownn & x_2\pownn & \cdots & x_d\pownn\\
}
, ~~~~~~~~
\by =
\ac{c}{
y\pown{1} \\
y\pown{2} \\
\vdots \\
y\pown{n} \\
}
%, ~~~~~~~~
%\bx\powni =
% \ac{c}{
% 1\\
% x_1\powni \\
% x_2\powni \\
% \vdots \\
% x_d\powni
% }
$$
a __linear regressor__ model, with the $d$-dimensional (expressed here without the bias term, $w_0$) weight column vector,
$$
\bw =
\ac{c}{
w_1 \\
w_2 \\
\vdots \\
w_d \\
}
$$
was previously found to be of the form
$$
y\powni\pred = \bw^\top \bx\powni
$$
for a single data instance, or for the full data set in a compact matrix notation
$$
\by\pred = \bX \bw
$$
(and rememering to add the bias term $w_0$ on $\bw$ and correspondingly adding fixed '1'-column in the $\bX$ matrix, later.)
An accociated cost function could be the MSE
$$
\ar{rl}{
\mbox{MSE}(\bX,\by;\bw) &= \frac{1}{n} \sum_{i=1}^{n} L\powni \\
&= \frac{1}{n} \sum_{i=1}^{n} \left( \bw^\top\bx\powni - y\powni\pred \right)^2\\
&\propto ||\bX \bw - \by\pred||_2^2
}
$$
here using the squared Euclidean norm, $\norm{2}^2$, via the $||\cdot||_2^2$ expressions. We used the MSE to express the total cost function, $J$, as
$$
\mbox{MSE} \propto J = ||\bX \bw - \by\pred||_2^2
$$
give or take a few constants, like $1/2$ or $1/n$.
### Adding Regularization to the Linear Regressor
Now the weights, $\bw$ (previously also known as $\btheta$), in this model are free to take on any value they like, and this can lead to both numerical problems and overfitting, if the algorithm decides to drive the weights to insane, humongous values, say $10^{200}$ or similar.
Also for some models, neural networks in particular, having weights outside the range -1 to 1 (or 0 to 1) may cause complete saturation of some of the internal non-linear components (the activation function).
Now, enters the ___regularization___ of the model: keep the weights at a sane level while doing the numerical gradient descent (GD) in the search space. This can quite simply be done by adding a ___penalty___ part, $\Omega$, to the $J$ function as
$$
\ar{rl}{
\tilde{J} &= J + \alpha \Omega(\bw)\\
&= \frac{1}{n} ||\bX \bw - \by||_2^2 + \alpha ||\bw||^2_2
}
$$
So, the algorithm now has to find an optimal value (minimum of $J$) for both the usual MSE part and for the added penalty scaled with the $\alpha$ constant.
### Regularization and Optimization for Neural Networks (NNs)
The regularization method mentioned here is strictly for a linear regression model, but such a model constitutes a major part of the neurons (or perceptrons), used in neural networks.
### Qa The Penalty Factor
Now, lets examine what $||\bw||^2_2$ effectively mean? It is composed of our well-known $\norm{2}^2$ norm and can also be expressed as simple as
$$
||\bw||^2_2 = \bw^\top\bw
$$
Construct a penaltiy function that implements $\bw^\top\bw$, re-using any functions from `numpy` (implementation could be a tiny _one-liner_).
Take $w_0$ into account, this weight factor should NOT be included in the norm. Also checkup on `numpy`s `dot` implementation, if you have not done so already: it is a typical pythonic _combo_ function, doing both dot op's (inner product) and matrix multiplication (outer product) dependent on the shape of the input parameters.
Then run it on the three test vectors below, and explain when the penalty factor is low and when it is high.
%% Cell type:code id: tags:
``` python
# Qa..first define some numeric helper functions for the test-vectors..
import numpy as np
import collections
def isFloat(x):
# is there a python single/double float??
return isinstance(x, float) or isinstance(x, np.float32) or isinstance(x, np.float64)
# NOT defined on Windows?: or isinstance(x, np.float128)
# Checks that a 'float' is 'sane' (original from libitmal)
def CheckFloat(x, checkrange=False, xmin=1E-200, xmax=1E200, verbose=0):
if verbose>1:
print(f"CheckFloat({x}, type={type(x)}")
if isinstance(x, collections.Iterable):
for i in x:
CheckFloat(i, checkrange=checkrange, xmin=xmin, xmax=xmax, verbose=verbose)
else:
#if (isinstance(x,int)):
# print("you gave me an integer, that was ignored")
# return
assert isFloat(x), f"x={x} is not a float/float64/numpy.float32/64/128, but a {type(x)}"
assert np.isnan(x)==False , "x is NAN"
assert np.isinf(x)==False , "x is inf"
assert np.isinf(-x)==False, "x is -inf"
# NOTE: missing test for denormalized float
if checkrange:
z=fabs(x)
assert z>=xmin, f"abs(x)={z} is smaller that expected min value={xmin}"
assert z<=xmax, f"abs(x)={z} is larger that expected max value={xmax}"
if verbose>0:
print(f"CheckFloat({x}, type={x} => OK")
# Checks that two 'floats' are 'close' (original from libitmal)
def CheckInRange(x, expected, eps=1E-9, autoconverttofloat=True, verbose=0):
assert eps>=0, "eps is less than zero"
if autoconverttofloat and (not isFloat(x) or not isFloat(expected) or not isFloat(eps)):
if verbose>1:
print(f"notice: autoconverting x={x} to float..")
return CheckInRange(1.0*x, 1.0*expected, 1.0*eps, False, verbose)
CheckFloat(x)
CheckFloat(expected)
CheckFloat(eps)
x0 = expected - eps
x1 = expected + eps
ok = x>=x0 and x<=x1
absdiff = np.fabs(x-expected)
if verbose > 0:
print(f"CheckInRange(x={x}, expected={expected}, eps={eps}: x in [{x0}; {x1}] => {ok}")
assert ok, f"x={x} is not within the range [{x0}; {x1}] for eps={eps}, got eps={absdiff}"
print("OK(setup..)")
```
%% Cell type:code id: tags:
``` python
# TODO: code
def Omega(w):
assert False, "TODO: implement Omega() here and remove this assert.."
# weight vector format: [w_0 w_1 .. w_d], ie. elem. 0 is the 'bias'
w_a = np.array([1., 2., -3.])
w_b = np.array([1E10, -3E10])
w_c = np.array([0.1, 0.2, -0.3, 0])
p_a = Omega(w_a)
p_b = Omega(w_b)
p_c = Omega(w_c)
print(f"P(w0)={p_a}")
print(f"P(w1)={p_b}")
print(f"P(w2)={p_c}")
# TEST VECTORS
e0 = 2*2+(-3)*(-3)
e1 = 9e+20
e2 = 0.13
CheckInRange(p_a, e0)
CheckInRange(p_b, e1)
CheckInRange(p_c, e2)
print("OK")
```
%% Cell type:markdown id: tags:
## Adding Regularization for Linear Regression Models
Adding the penalty $\alpha ||\bw||^2_2$ actually corresponds to the Scikit-learn model `sklearn.linear_model.Ridge` and there are, as usual, a bewildering array of regulized models to choose from in Scikit-learn with exotic names like `Lasso` and `Lars`
> https://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model
Let us just examine `Ridge`, `Lasso` and `ElasticNet` here.
### Qb Explain the Ridge Plot
First take a peek into the plots (and code) below, that fits the `Ridge`, `Lasso` and `ElasticNet` to a polynomial model. The plots show three fits with different $\alpha$ values (0, 10$^{-5}$, and 1).
First, explain what the different $\alpha$ does to the actual fitting for the `Ridge` model in the plot.
%% Cell type:code id: tags:
``` python
# TODO: Qb, just run the code..
%matplotlib inline
from sklearn.linear_model import LinearRegression, SGDRegressor, Ridge, ElasticNet, Lasso
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
def FitAndPlotModel(name, model_class, X, X_new, y, **model_kargs):
plt.figure(figsize=(16,8))
alphas=(0, 10**-5, 1)
random_state=42
for alpha, style in zip(alphas, ("b-", "g--", "r:")):
#print(model_kargs)
model = model_class(alpha, **model_kargs) if alpha > 0 else LinearRegression()
model_pipe = Pipeline([
("poly_features", PolynomialFeatures(degree=12, include_bias=False)),
("std_scaler", StandardScaler()),
("regul_reg", model),
])
model_pipe.fit(X, y)
y_new_regul = model_pipe.predict(X_new)
lw = 2 if alpha > 0 else 1
plt.plot(X_new, y_new_regul, style, linewidth=lw, label=r"$\alpha = {}$".format(alpha))
plt.plot(X, y, "b.", linewidth=3)
plt.legend(loc="upper left", fontsize=15)
plt.xlabel("$x_1$", fontsize=18)
plt.title(name)
plt.axis([0, 3, 0, 4])
def GenerateData():
np.random.seed(42)
m = 20
X = 3 * np.random.rand(m, 1)
y = 1 + 0.5 * X + np.random.randn(m, 1) / 1.5
X_new = np.linspace(0, 3, 100).reshape(100, 1)
return X, X_new, y
X, X_new, y = GenerateData()
FitAndPlotModel('ridge', Ridge, X, X_new, y)
FitAndPlotModel('lasso', Lasso, X, X_new, y)
FitAndPlotModel('elasticnet', ElasticNet, X, X_new, y, l1_ratio=0.1)
print("OK(plot)")
```
%% Cell type:markdown id: tags:
### Qc Explain the Ridge, Lasso and ElasticNet Regulized Methods
Then explain the different regularization methods used for the `Ridge`, `Lasso` and `ElasticNet` models, by looking at the math formulas for the methods in the Scikit-learn documentation and/or using [HOML].
%% Cell type:code id: tags:
``` python
# TODO:(in text..)
assert False, "Explain the math of Ridge, Lasso and ElasticNet.."
```
%% Cell type:markdown id: tags:
### Qd Regularization and Overfitting
Finally, comment on how regularization may be used to reduce a potential tendency to overfit the data
Describe the situation with the ___tug-of-war___ between the MSE ($J$) and regulizer ($\Omega$) terms in $\tilde{J}$
$$
\tilde{J} = J + \alpha \Omega(\bw)\\
$$
and the potential problem of $\bw^*$ being far, far away from the origin, and say for a fixed $\alpha=1$ in regulizer term (normally for real data $\alpha \ll 1$).
<img src="https://itundervisning.ase.au.dk/SWMAL/L10/Figs/weights_regularization_l2.png" alt="WARNING: could not get image from server." style="width:240px">
<img src="https://itundervisning.ase.au.dk/SWMAL/L09/Figs/weights_regularization_l2.png" alt="WARNING: could not get image from server." style="width:240px">
OPTIONAL part: Would data preprocessing in the form of scaling, standardization or normalization be of any help to that particular situation? If so, describe.
%% Cell type:code id: tags:
``` python
# TODO: (in text..)
Assert False, "Explain the tug-of-war.."
```
%% Cell type:markdown id: tags:
REVISIONS||
---------||
2018-03-01| CEF, initial.
2018-03-06| CEF, updated.
2018-03-07| CEF, split Qb into Qb+c+d and added NN comment.
2018-03-11| CEF, updated Qa and $w_0$ issues.
2018-03-11| CEF, updated Qd with plot and Q.
2018-03-11| CEF, clarified $w_0$ issue and update $\tilde{J}$'s.
2019-10-15| CEF, updated for ITMAL E19.
2019-10-19| CEF, updated text, added float-check functions.
2020-03-23| CEF, updated to ITMAL F20.
2020-10-20| CEF, updated to ITMAL E20.
2020-10-27| CEF, minor updates.
2020-10-28| CEF, made preprocessing optional part of Qq (tug-of-war).
2020-03-17| CEF, updated to ITMAL F21.
2021-10-31| CEF, updated to ITMAL E21.
2022-03-31| CEF, updated to SWMAL F22.
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment