In this exercise, we need to explain all important overall concepts in training. Let's begin with Figure 5.3 from Deep Learning, Ian Goodfellow, et. al. [DL], that pretty much sums it all up
<imgsrc="https://itundervisning.ase.au.dk/GITMAL/L07/Figs/dl_generalization_error.png"alt="WARNING: you need to be logged into Blackboard to view images"style="height:500px">
<imgsrc="https://itundervisning.ase.au.dk/GITMAL/L09/Figs/dl_generalization_error.png"alt="WARNING: you need to be logged into Blackboard to view images"style="height:500px">
### Qa) On Generalization Error
Write a detailed description of figure 5.3 (above) for your hand-in.
All concepts in the figure must be explained
* training/generalization error,
* underfit/overfit zone,
* optimal capacity,
* generalization gab,
* and the two axes: x/capacity, y/error.
%% Cell type:code id: tags:
``` python
# TODO: ...in text
assertFalse,"TODO: write some text.."
```
%% Cell type:markdown id: tags:
### Qb A MSE-Epoch/Error Plot
Next, we look at a SGD model for fitting polynomial, that is _polynomial regression_ similar to what Géron describes in [HOML] ("Polynomial Regression" + "Learning Curves").
Review the code below for plotting the RMSE vs. the iteration number or epoch below (three cells, part I/II/III).
Write a short description of the code, and comment on the important points in the generation of the (R)MSE array.
The training phase output lots of lines like
> `epoch= 104, mse_train=1.50, mse_val=2.37` <br>
> `epoch= 105, mse_train=1.49, mse_val=2.35`
What is an ___epoch___ and what is `mse_train` and `mse_val`?
NOTE$_1$: the generalization plot figure 5.3 in [DL] (above) and the plots below have different x-axis, and are not to be compared directly!
NOTE$_2$: notice that a 90 degree polynomial is used for the polynomial regression. This is just to produce a model with an extremly high capacity.
%% Cell type:code id: tags:
``` python
# Run code: Qb(part I)
# NOTE: modified code from [GITHOML], 04_training_linear_models.ipynb
How would you implement ___early stopping___, in the code above?
Write an explanation of the early stopping concept...that is, just write some pseudo code that 'implements' the early stopping.
OPTIONAL: also implement your early stopping pseudo code in Python, and get it to work with the code above (and not just flipping the hyperparameter to `early_stopping=True` on the `SGDRegressor`).
%% Cell type:code id: tags:
``` python
# TODO: early stopping..
assertFalse,"TODO: explain early stopping"
```
%% Cell type:markdown id: tags:
### Qd) Explain the Polynomial RMSE-Capacity plot
Now we revisit the concepts from `capacity_under_overfitting.ipynb` notebook and the polynomial fitting with a given capacity (polynomial degree).
Peek into the cell below (code similar to what we saw in `capacity_under_overfitting.ipynb`), and explain the generated RMSE-Capacity plot. Why does the _training error_ keep dropping, while the _CV-error_ drops until around capacity 3, and then begin to rise again?
What does the x-axis _Capacity_ and y-axis _RMSE_ represent?
Try increasing the model capacity. What happens when you do plots for `degrees` larger than around 10? Relate this with what you found via Qa+b in `capacity_under_overfitting.ipynb`.
%% Cell type:code id: tags:
``` python
# Run and review this code
# NOTE: modified code from [GITHOML], 04_training_linear_models.ipynb
When instantiating a Scikit-learn model in python most or all constructor parameters have _default_ values. These values are not part of the internal model and are hence called ___hyperparametes___---in contrast to _normal_ model parameters, for example the neuron weights, $\mathbf w$, for an `MLP` model.
When instantiating a Scikit-learn model in python most or all constructor parameters have _default_ values. These values are not part of the internal model and are hence called ___hyperparameters___---in contrast to _normal_ model parameters, for example the neuron weights, $\mathbf w$, for an `MLP` model.
### Manual Tuning Hyperparameters
Below is an example of the python constructor for the support-vector classifier `sklearn.svm.SVC`, with say the `kernel` hyperparameter having the default value `'rbf'`. If you should choose, what would you set it to other than `'rbf'`?
```python
classsklearn.svm.SVC(
C=1.0,
kernel=’rbf’,
degree=3,
gamma=’auto_deprecated’,
coef0=0.0,
shrinking=True,
probability=False,
tol=0.001,
cache_size=200,
class_weight=None,
verbose=False,
max_iter=-1,
decision_function_shape=’ovr’,
random_state=None
)
```
The default values might be a sensible general starting point, but for your data, you might want to optimize the hyperparameters to yield a better result.
To be able to set `kernel` to a sensible value you need to go into the documentation for the `SVC` and understand what the kernel parameter represents, and what values it can be set to, and you need to understand the consequences of setting `kernel` to something different than the default...and the story repeats for every other hyperparameter!
### Brute Force Search
An alternative to this structured, but time-consuming approach, is just to __brute-force__ a search of interesting hyperparameters, an choosing the 'best' parameters according to a fit-predict and some score, say 'f1'.
An alternative to this structured, but time-consuming approach, is just to __brute-force__ a search of interesting hyperparameters, and choose the 'best' parameters according to a fit-predict and some score, say 'f1'.
<imgsrc="https://itundervisning.ase.au.dk/GITMAL/L08/Figs/gridsearch.png"alt="WARNING: you need to be logged into Blackboard to view images"style="width:350px">
<imgsrc="https://itundervisning.ase.au.dk/GITMAL/L10/Figs/gridsearch.png"alt="WARNING: you need to be logged into Blackboard to view images"style="width:350px">
<small><em>
<center> Conceptual graphical view of grid search for two distinct hyperparameters. </center>
<center> Notice that you would normally search hyperparameters like `alpha` with an exponential range, say [0.01, 0.1, 1, 10] or similar.</center>
</em></small>
Now, you just pick out some hyperparameters, that you figure are important, set them to a suitable range, say
```python
'kernel':('linear','rbf'),
'C':[1,10]
```
and fire up a full (grid) search on this hyperparameter set, that will try out all your specified combination of `kernel` and `C` for the model, and then prints the hyperparameter set with the highest score...
The demo code below sets up some of our well known 'hello-world' data and then run a _grid search_ on a particular model, here a _support-vector classifier_ (SVC)
Other models and datasets ('mnist', 'iris', 'moon') can also be examined.
### Qa Explain GridSearchCV
There are two code cells below: 1) function setup, 2) the actual grid-search.
Review the code cells and write a __short__ summary. Mainly focus on __cell 2__, but dig into cell 1 if you find it interesting (notice the use of local-function, a nifty feature in python).
and write a short description of how the `GridSeachCV` works: explain how the search parameter set is created and the overall search mechanism is functioning (without going into to much detail).
and write a short description of how the `GridSeachCV` works: explain how the search parameter set is created and the overall search mechanism is functioning (without going into too much detail).
What role does the parameter `scoring='f1_micro'` play in the `GridSearchCV`, and what does `n_jobs=-1` mean?
NOTICE: you need the dataloader module from `libitmal`, clone
print("\t[%2d]: %0.3f (+/-%0.03f) for %r"%(i,mean,std*2,params))
i+=1
except:
print("WARNING: the random search do not provide means/stds")
globalcurrmode
assert"f1_micro"==str(model.scoring),f"come on, we need to fix the scoring to be able to compare model-fits! Your scoreing={str(model.scoring)}...remember to add scoring='f1_micro' to the search"
print(f' test data: X_test.shape ={ShapeToString(X_test)}, y_test.shape ={ShapeToString(y_test)}')
print()
returnX_train,X_test,y_train,y_test
print('OK(function setup, hope MNIST loads works, seem best if you got Keras or Tensorflow installed!)')
```
%% Cell type:code id: tags:
``` python
# TODO: Qa, code review..cell 2) the actual grid-search
# Setup data
X_train,X_test,y_train,y_test=LoadAndSetupData(
'iris')# 'iris', 'moon', or 'mnist'
# Setup search parameters
model=svm.SVC(
gamma=0.001
)# NOTE: gamma="scale" does not work in older Scikit-learn frameworks,
# FIX: replace with model = svm.SVC(gamma=0.001)
tuning_parameters={
'kernel':('linear','rbf'),
'C':[0.1,1,10]
}
CV=5
VERBOSE=0
# Run GridSearchCV for the model
start=time()
grid_tuned=GridSearchCV(model,
tuning_parameters,
cv=CV,
scoring='f1_micro',
verbose=VERBOSE,
n_jobs=-1,
iid=True)
grid_tuned.fit(X_train,y_train)
t=time()-start
# Report result
b0,m0=FullReport(grid_tuned,X_test,y_test,t)
print('OK(grid-search)')
```
%% Cell type:markdown id: tags:
### Qb Hyperparameter Grid Search using an SDG classifier
Now, replace the `svm.SVC` model with an `SGDClassifier` and a suitable set of the hyperparameters for that model.
You need at least four or five different hyperparameters from the `SGDClassifier` in the search-space before it begins to take considerable compute time doing the full grid search.
So, repeat the search with the `SGDClassifier`, and be sure to add enough hyperparameters to the grid-search, such that the search takes a considerable time to run, that is a couple of minutes or up to some hours..
%% Cell type:code id: tags:
``` python
# TODO: grid search
assertFalse,"TODO: make a grid search on the SDG classifier.."
```
%% Cell type:markdown id: tags:
### Qc Hyperparameter Random Search using an SDG classifier
Now, add code to run a `RandomizedSearchCV` instead.
<imgsrc="https://itundervisning.ase.au.dk/GITMAL/L08/Figs/randomsearch.png"alt="WARNING: you need to be logged into Blackboard to view images"style="width:350px">
<imgsrc="https://itundervisning.ase.au.dk/GITMAL/L10/Figs/randomsearch.png"alt="WARNING: you need to be logged into Blackboard to view images"style="width:350px">
<small><em>
<center> Conceptual graphical view of randomized search for two distinct hyperparameters. </center>
</em></small>
Use these default parameters for the random search, similar to the default parameters for the grid search
```python
random_tuned=RandomizedSearchCV(
model,
tuning_parameters,
n_iter=20,
random_state=42,
cv=CV,
scoring='f1_micro',
verbose=VERBOSE,
n_jobs=-1,
iid=True
)
```
but with the two new parameters, `n_iter` and `random_state` added. Since the search-type is now randow, the `random_state` gives sense, but essential to random search is the new `n_tier` parameter.
but with the two new parameters, `n_iter` and `random_state` added. Since the search-type is now random, the `random_state` gives sense, but essential to random search is the new `n_tier` parameter.
So: investigate the `n_iter` parameter...in code and write an conceptual explanation in text.
So: investigate the `n_iter` parameter...in code and write a conceptual explanation in text.
Comparison of time (seconds) to complete `GridSearch` versus `RandomizedSearchCV`, does not necessarily give any sense, if your grid search completes in a few seconds (as for the iris tiny-data). You need a search that runs for minute, hours, or days.
Comparison of time (seconds) to complete `GridSearch` versus `RandomizedSearchCV`, does not necessarily give any sense, if your grid search completes in a few seconds (as for the iris tiny-data). You need a search that runs for minutes, hours, or days.