Commit 901823c1 authored by Carsten Eie Frigaard's avatar Carsten Eie Frigaard
Browse files

update

parent f33844cc
%% Cell type:markdown id: tags:
# ITMAL Exercise
## Generalization Error
In this exercise, we need to explain all important overall concepts in training. Let's begin with Figure 5.3 from Deep Learning, Ian Goodfellow, et. al. [DL], that pretty much sums it all up
<img src="https://itundervisning.ase.au.dk/GITMAL/L07/Figs/dl_generalization_error.png" alt="WARNING: you need to be logged into Blackboard to view images" style="height:500px">
<img src="https://itundervisning.ase.au.dk/GITMAL/L09/Figs/dl_generalization_error.png" alt="WARNING: you need to be logged into Blackboard to view images" style="height:500px">
### Qa) On Generalization Error
Write a detailed description of figure 5.3 (above) for your hand-in.
All concepts in the figure must be explained
* training/generalization error,
* underfit/overfit zone,
* optimal capacity,
* generalization gab,
* and the two axes: x/capacity, y/error.
%% Cell type:code id: tags:
``` python
# TODO: ...in text
assert False, "TODO: write some text.."
```
%% Cell type:markdown id: tags:
### Qb A MSE-Epoch/Error Plot
Next, we look at a SGD model for fitting polynomial, that is _polynomial regression_ similar to what Géron describes in [HOML] ("Polynomial Regression" + "Learning Curves").
Review the code below for plotting the RMSE vs. the iteration number or epoch below (three cells, part I/II/III).
Write a short description of the code, and comment on the important points in the generation of the (R)MSE array.
The training phase output lots of lines like
> `epoch= 104, mse_train=1.50, mse_val=2.37` <br>
> `epoch= 105, mse_train=1.49, mse_val=2.35`
What is an ___epoch___ and what is `mse_train` and `mse_val`?
NOTE$_1$: the generalization plot figure 5.3 in [DL] (above) and the plots below have different x-axis, and are not to be compared directly!
NOTE$_2$: notice that a 90 degree polynomial is used for the polynomial regression. This is just to produce a model with an extremly high capacity.
%% Cell type:code id: tags:
``` python
# Run code: Qb(part I)
# NOTE: modified code from [GITHOML], 04_training_linear_models.ipynb
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import SGDRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
np.random.seed(42)
def GenerateData():
m = 100
X = 6 * np.random.rand(m, 1) - 3
y = 2 + X + 0.5 * X**2 + np.random.randn(m, 1)
return X, y
X, y = GenerateData()
X_train, X_val, y_train, y_val = \
train_test_split( \
X[:50], y[:50].ravel(), \
test_size=0.5, \
random_state=10)
print("X_train.shape=",X_train.shape)
print("X_val .shape=",X_val.shape)
print("y_train.shape=",y_train.shape)
print("y_val .shape=",y_val.shape)
poly_scaler = Pipeline([
("poly_features", PolynomialFeatures(degree=90, include_bias=False)),
("std_scaler", StandardScaler()),
])
X_train_poly_scaled = poly_scaler.fit_transform(X_train)
X_val_poly_scaled = poly_scaler.transform(X_val)
X_new=np.linspace(-3, 3, 100).reshape(100, 1)
plt.plot(X, y, "b.", label="All X-y Data")
plt.xlabel("$x_1$", fontsize=18, )
plt.ylabel("$y$", rotation=0, fontsize=18)
plt.legend(loc="upper left", fontsize=14)
plt.axis([-3, 3, 0, 10])
plt.show()
print('OK')
```
%% Cell type:code id: tags:
``` python
# Run code: Qb(part II)
def Train(X_train, y_train, X_val, y_val, n_epochs, verbose=False):
print("Training...n_epochs=",n_epochs)
train_errors, val_errors = [], []
sgd_reg = SGDRegressor(max_iter=1,
penalty=None,
eta0=0.0005,
warm_start=True,
early_stopping=False,
learning_rate="constant",
tol=-float("inf"),
random_state=42)
for epoch in range(n_epochs):
sgd_reg.fit(X_train, y_train)
y_train_predict = sgd_reg.predict(X_train)
y_val_predict = sgd_reg.predict(X_val)
mse_train=mean_squared_error(y_train, y_train_predict)
mse_val =mean_squared_error(y_val , y_val_predict)
train_errors.append(mse_train)
val_errors .append(mse_val)
if verbose:
print(f" epoch={epoch:4d}, mse_train={mse_train:4.2f}, mse_val={mse_val:4.2f}")
return train_errors, val_errors
n_epochs = 500
train_errors, val_errors = Train(X_train_poly_scaled, y_train, X_val_poly_scaled, y_val, n_epochs, True)
print('OK')
```
%% Cell type:code id: tags:
``` python
# Run code: Qb(part III)
best_epoch = np.argmin(val_errors)
best_val_rmse = np.sqrt(val_errors[best_epoch])
plt.figure(figsize=(10,5))
plt.annotate('Best model',
xy=(best_epoch, best_val_rmse),
xytext=(best_epoch, best_val_rmse + 1),
ha="center",
arrowprops=dict(facecolor='black', shrink=0.05),
fontsize=16,
)
best_val_rmse -= 0.03 # just to make the graph look better
plt.plot([0, n_epochs], [best_val_rmse, best_val_rmse], "k:", linewidth=2)
plt.plot(np.sqrt(train_errors), "b--", linewidth=2, label="Training set")
plt.plot(np.sqrt(val_errors), "g-", linewidth=3, label="Validation set")
plt.legend(loc="upper right", fontsize=14)
plt.xlabel("Epoch", fontsize=14)
plt.ylabel("RMSE", fontsize=14)
plt.show()
```
%% Cell type:code id: tags:
``` python
# TODO: code review..
assert False, "TODO: code review in text form"
```
%% Cell type:markdown id: tags:
### Qc) Early Stopping
How would you implement ___early stopping___, in the code above?
Write an explanation of the early stopping concept...that is, just write some pseudo code that 'implements' the early stopping.
OPTIONAL: also implement your early stopping pseudo code in Python, and get it to work with the code above (and not just flipping the hyperparameter to `early_stopping=True` on the `SGDRegressor`).
%% Cell type:code id: tags:
``` python
# TODO: early stopping..
assert False, "TODO: explain early stopping"
```
%% Cell type:markdown id: tags:
### Qd) Explain the Polynomial RMSE-Capacity plot
Now we revisit the concepts from `capacity_under_overfitting.ipynb` notebook and the polynomial fitting with a given capacity (polynomial degree).
Peek into the cell below (code similar to what we saw in `capacity_under_overfitting.ipynb`), and explain the generated RMSE-Capacity plot. Why does the _training error_ keep dropping, while the _CV-error_ drops until around capacity 3, and then begin to rise again?
What does the x-axis _Capacity_ and y-axis _RMSE_ represent?
Try increasing the model capacity. What happens when you do plots for `degrees` larger than around 10? Relate this with what you found via Qa+b in `capacity_under_overfitting.ipynb`.
%% Cell type:code id: tags:
``` python
# Run and review this code
# NOTE: modified code from [GITHOML], 04_training_linear_models.ipynb
%matplotlib inline
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error
def true_fun(X):
return np.cos(1.5 * np.pi * X)
def GenerateData():
n_samples = 30
#degrees = [1, 4, 15]
degrees = range(1,8)
X = np.sort(np.random.rand(n_samples))
y = true_fun(X) + np.random.randn(n_samples) * 0.1
return X, y, degrees
np.random.seed(0)
X, y, degrees = GenerateData()
print("Iterating...degrees=",degrees)
capacities, rmses_training, rmses_validation= [], [], []
for i in range(len(degrees)):
d=degrees[i]
polynomial_features = PolynomialFeatures(degree=d, include_bias=False)
linear_regression = LinearRegression()
pipeline = Pipeline([
("polynomial_features", polynomial_features),
("linear_regression", linear_regression)
])
Z = X[:, np.newaxis]
pipeline.fit(Z, y)
p = pipeline.predict(Z)
train_rms = mean_squared_error(y,p)
# Evaluate the models using crossvalidation
scores = cross_val_score(pipeline, Z, y, scoring="neg_mean_squared_error", cv=10)
score_mean = -scores.mean()
rmse_training=sqrt(train_rms)
rmse_validation=sqrt(score_mean)
print(f" degree={d:4d}, rmse_training={rmse_training:4.2f}, rmse_cv={rmse_validation:4.2f}")
capacities .append(d)
rmses_training .append(rmse_training)
rmses_validation.append(rmse_validation)
plt.figure(figsize=(7,4))
plt.plot(capacities, rmses_training, "b--", linewidth=2, label="training RMSE")
plt.plot(capacities, rmses_validation,"g-", linewidth=2, label="validation RMSE")
plt.legend(loc="upper right", fontsize=14)
plt.xlabel("Capacity", fontsize=14)
plt.ylabel("RMSE", fontsize=14)
plt.show()
print('OK')
```
%% Cell type:code id: tags:
``` python
# TODO: investigate..
assert False, "TODO: ...answer in text form"
```
%% Cell type:markdown id: tags:
REVISIONS| |
---------| |
2018-1219| CEF, initial.
2018-0214| CEF, major update and put in sync with under/overfitting exe.
2018-0220| CEF, fixed revision table malformatting.
2018-0225| CEF, minor text updates, and made Qc optional.
2018-0225| CEF, updated code, made more functions.
2018-0311| CEF, corrected RSME to RMSE.
2019-1008| CEF, updated to ITMAL E19.
2020-0314| CEF, updated to ITMAL F20.
2020-1015| CEF, updated to ITMAL E20.
2020-1117| CEF, added comment on 90 degree polynomial, made early stopping a pseudo code exe.
2021-0322| CEF, changed crossref from "capacity_under_overfitting.ipynb Qc" to Qa+b in QdExplain the Polynomial RMSE-Capacity Plot.
2021-0323| CEF, changed 'cv RMSE' legend to 'validation RMSE'.
2021-1031| CEF, updated to ITMAL E21.
......
%% Cell type:markdown id: tags:
# ITMAL Exercise
## Hyperparameters and Gridsearch
When instantiating a Scikit-learn model in python most or all constructor parameters have _default_ values. These values are not part of the internal model and are hence called ___hyperparametes___---in contrast to _normal_ model parameters, for example the neuron weights, $\mathbf w$, for an `MLP` model.
When instantiating a Scikit-learn model in python most or all constructor parameters have _default_ values. These values are not part of the internal model and are hence called ___hyperparameters___---in contrast to _normal_ model parameters, for example the neuron weights, $\mathbf w$, for an `MLP` model.
### Manual Tuning Hyperparameters
Below is an example of the python constructor for the support-vector classifier `sklearn.svm.SVC`, with say the `kernel` hyperparameter having the default value `'rbf'`. If you should choose, what would you set it to other than `'rbf'`?
```python
class sklearn.svm.SVC(
C=1.0,
kernel=rbf,
degree=3,
gamma=auto_deprecated,
coef0=0.0,
shrinking=True,
probability=False,
tol=0.001,
cache_size=200,
class_weight=None,
verbose=False,
max_iter=-1,
decision_function_shape=ovr,
random_state=None
)
```
The default values might be a sensible general starting point, but for your data, you might want to optimize the hyperparameters to yield a better result.
To be able to set `kernel` to a sensible value you need to go into the documentation for the `SVC` and understand what the kernel parameter represents, and what values it can be set to, and you need to understand the consequences of setting `kernel` to something different than the default...and the story repeats for every other hyperparameter!
### Brute Force Search
An alternative to this structured, but time-consuming approach, is just to __brute-force__ a search of interesting hyperparameters, an choosing the 'best' parameters according to a fit-predict and some score, say 'f1'.
An alternative to this structured, but time-consuming approach, is just to __brute-force__ a search of interesting hyperparameters, and choose the 'best' parameters according to a fit-predict and some score, say 'f1'.
<img src="https://itundervisning.ase.au.dk/GITMAL/L08/Figs/gridsearch.png" alt="WARNING: you need to be logged into Blackboard to view images" style="width:350px">
<img src="https://itundervisning.ase.au.dk/GITMAL/L10/Figs/gridsearch.png" alt="WARNING: you need to be logged into Blackboard to view images" style="width:350px">
<small><em>
<center> Conceptual graphical view of grid search for two distinct hyperparameters. </center>
<center> Notice that you would normally search hyperparameters like `alpha` with an exponential range, say [0.01, 0.1, 1, 10] or similar.</center>
</em></small>
Now, you just pick out some hyperparameters, that you figure are important, set them to a suitable range, say
```python
'kernel':('linear', 'rbf'),
'C':[1, 10]
```
and fire up a full (grid) search on this hyperparameter set, that will try out all your specified combination of `kernel` and `C` for the model, and then prints the hyperparameter set with the highest score...
The demo code below sets up some of our well known 'hello-world' data and then run a _grid search_ on a particular model, here a _support-vector classifier_ (SVC)
Other models and datasets ('mnist', 'iris', 'moon') can also be examined.
### Qa Explain GridSearchCV
There are two code cells below: 1) function setup, 2) the actual grid-search.
Review the code cells and write a __short__ summary. Mainly focus on __cell 2__, but dig into cell 1 if you find it interesting (notice the use of local-function, a nifty feature in python).
In detail, examine the lines:
```python
grid_tuned = GridSearchCV(model, tuning_parameters, ..
grid_tuned.fit(X_train, y_train)
..
FullReport(grid_tuned , X_test, y_test, time_gridsearch)
```
and write a short description of how the `GridSeachCV` works: explain how the search parameter set is created and the overall search mechanism is functioning (without going into to much detail).
and write a short description of how the `GridSeachCV` works: explain how the search parameter set is created and the overall search mechanism is functioning (without going into too much detail).
What role does the parameter `scoring='f1_micro'` play in the `GridSearchCV`, and what does `n_jobs=-1` mean?
NOTICE: you need the dataloader module from `libitmal`, clone
```
> git clone https://cfrigaard@bitbucket.org/cfrigaard/itmal
```
or pull the GIT repository to get the latest version, and put `libitmal` into the python path.
%% Cell type:code id: tags:
``` python
# TODO: Qa, code review..cell 1) function setup
from time import time
import numpy as np
from sklearn import svm
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split
from sklearn.metrics import classification_report, f1_score
from sklearn import datasets
from libitmal import dataloaders as itmaldataloaders # Needed for load of iris, moon and mnist
currmode="N/A" # GLOBAL var!
def SearchReport(model):
def GetBestModelCTOR(model, best_params):
def GetParams(best_params):
ret_str=""
for key in sorted(best_params):
value = best_params[key]
temp_str = "'" if str(type(value))=="<class 'str'>" else ""
if len(ret_str)>0:
ret_str += ','
ret_str += f'{key}={temp_str}{value}{temp_str}'
return ret_str
try:
param_str = GetParams(best_params)
return type(model).__name__ + '(' + param_str + ')'
except:
return "N/A(1)"
print("\nBest model set found on train set:")
print()
print(f"\tbest parameters={model.best_params_}")
print(f"\tbest '{model.scoring}' score={model.best_score_}")
print(f"\tbest index={model.best_index_}")
print()
print(f"Best estimator CTOR:")
print(f"\t{model.best_estimator_}")
print()
try:
print(f"Grid scores ('{model.scoring}') on development set:")
means = model.cv_results_['mean_test_score']
stds = model.cv_results_['std_test_score']
i=0
for mean, std, params in zip(means, stds, model.cv_results_['params']):
print("\t[%2d]: %0.3f (+/-%0.03f) for %r" % (i, mean, std * 2, params))
i += 1
except:
print("WARNING: the random search do not provide means/stds")
global currmode
assert "f1_micro"==str(model.scoring), f"come on, we need to fix the scoring to be able to compare model-fits! Your scoreing={str(model.scoring)}...remember to add scoring='f1_micro' to the search"
return f"best: dat={currmode}, score={model.best_score_:0.5f}, model={GetBestModelCTOR(model.estimator,model.best_params_)}", model.best_estimator_
def ClassificationReport(model, X_test, y_test, target_names=None):
assert X_test.shape[0]==y_test.shape[0]
print("\nDetailed classification report:")
print("\tThe model is trained on the full development set.")
print("\tThe scores are computed on the full evaluation set.")
print()
y_true, y_pred = y_test, model.predict(X_test)
print(classification_report(y_true, y_pred, target_names))
print()
def FullReport(model, X_test, y_test, t):
print(f"SEARCH TIME: {t:0.2f} sec")
beststr, bestmodel = SearchReport(model)
ClassificationReport(model, X_test, y_test)
print(f"CTOR for best model: {bestmodel}\n")
print(f"{beststr}\n")
return beststr, bestmodel
def LoadAndSetupData(mode, test_size=0.3):
assert test_size>=0.0 and test_size<=1.0
def ShapeToString(Z):
n = Z.ndim
s = "("
for i in range(n):
s += f"{Z.shape[i]:5d}"
if i+1!=n:
s += ";"
return s+")"
global currmode
currmode=mode
print(f"DATA: {currmode}..")
if mode=='moon':
X, y = itmaldataloaders.MOON_GetDataSet(n_samples=5000, noise=0.2)
itmaldataloaders.MOON_Plot(X, y)
elif mode=='mnist':
X, y = itmaldataloaders.MNIST_GetDataSet(load_mode=0)
if X.ndim==3:
X=np.reshape(X, (X.shape[0], -1))
elif mode=='iris':
X, y = itmaldataloaders.IRIS_GetDataSet()
else:
raise ValueError(f"could not load data for that particular mode='{mode}', only 'moon'/'mnist'/'iris' supported")
print(f' org. data: X.shape ={ShapeToString(X)}, y.shape ={ShapeToString(y)}')
assert X.ndim==2
assert X.shape[0]==y.shape[0]
assert y.ndim==1 or (y.ndim==2 and y.shape[1]==0)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=test_size, random_state=0, shuffle=True
)
print(f' train data: X_train.shape={ShapeToString(X_train)}, y_train.shape={ShapeToString(y_train)}')
print(f' test data: X_test.shape ={ShapeToString(X_test)}, y_test.shape ={ShapeToString(y_test)}')
print()
return X_train, X_test, y_train, y_test
print('OK(function setup, hope MNIST loads works, seem best if you got Keras or Tensorflow installed!)')
```
%% Cell type:code id: tags:
``` python
# TODO: Qa, code review..cell 2) the actual grid-search
# Setup data
X_train, X_test, y_train, y_test = LoadAndSetupData(
'iris') # 'iris', 'moon', or 'mnist'
# Setup search parameters
model = svm.SVC(
gamma=0.001
) # NOTE: gamma="scale" does not work in older Scikit-learn frameworks,
# FIX: replace with model = svm.SVC(gamma=0.001)
tuning_parameters = {
'kernel': ('linear', 'rbf'),
'C': [0.1, 1, 10]
}
CV = 5
VERBOSE = 0
# Run GridSearchCV for the model
start = time()
grid_tuned = GridSearchCV(model,
tuning_parameters,
cv=CV,
scoring='f1_micro',
verbose=VERBOSE,
n_jobs=-1,
iid=True)
grid_tuned.fit(X_train, y_train)
t = time() - start
# Report result
b0, m0 = FullReport(grid_tuned, X_test, y_test, t)
print('OK(grid-search)')
```
%% Cell type:markdown id: tags:
### Qb Hyperparameter Grid Search using an SDG classifier
Now, replace the `svm.SVC` model with an `SGDClassifier` and a suitable set of the hyperparameters for that model.
You need at least four or five different hyperparameters from the `SGDClassifier` in the search-space before it begins to take considerable compute time doing the full grid search.
So, repeat the search with the `SGDClassifier`, and be sure to add enough hyperparameters to the grid-search, such that the search takes a considerable time to run, that is a couple of minutes or up to some hours..
%% Cell type:code id: tags:
``` python
# TODO: grid search
assert False, "TODO: make a grid search on the SDG classifier.."
```
%% Cell type:markdown id: tags:
### Qc Hyperparameter Random Search using an SDG classifier
Now, add code to run a `RandomizedSearchCV` instead.
<img src="https://itundervisning.ase.au.dk/GITMAL/L08/Figs/randomsearch.png" alt="WARNING: you need to be logged into Blackboard to view images" style="width:350px" >
<img src="https://itundervisning.ase.au.dk/GITMAL/L10/Figs/randomsearch.png" alt="WARNING: you need to be logged into Blackboard to view images" style="width:350px" >
<small><em>
<center> Conceptual graphical view of randomized search for two distinct hyperparameters. </center>
</em></small>
Use these default parameters for the random search, similar to the default parameters for the grid search
```python
random_tuned = RandomizedSearchCV(
model,
tuning_parameters,
n_iter=20,
random_state=42,
cv=CV,
scoring='f1_micro',
verbose=VERBOSE,
n_jobs=-1,
iid=True
)
```
but with the two new parameters, `n_iter` and `random_state` added. Since the search-type is now randow, the `random_state` gives sense, but essential to random search is the new `n_tier` parameter.
but with the two new parameters, `n_iter` and `random_state` added. Since the search-type is now random, the `random_state` gives sense, but essential to random search is the new `n_tier` parameter.
So: investigate the `n_iter` parameter...in code and write an conceptual explanation in text.
So: investigate the `n_iter` parameter...in code and write a conceptual explanation in text.
Comparison of time (seconds) to complete `GridSearch` versus `RandomizedSearchCV`, does not necessarily give any sense, if your grid search completes in a few seconds (as for the iris tiny-data). You need a search that runs for minute, hours, or days.
Comparison of time (seconds) to complete `GridSearch` versus `RandomizedSearchCV`, does not necessarily give any sense, if your grid search completes in a few seconds (as for the iris tiny-data). You need a search that runs for minutes, hours, or days.