Commit 62e7c909 authored by Carsten Eie Frigaard's avatar Carsten Eie Frigaard
Browse files


parent 8f904bd4
%% Cell type:markdown id: tags:
# ITMAL Exercise
## Generalization Error
In this exercise, we need to explain all important overall concepts in training. Let's begin with Figure 5.3 from Deep Learning, Ian Goodfellow, et. al. [DL], that pretty much sums it all up
<img src="" alt="WARNING: you need to be logged into Blackboard to view images" style="height:500px">
### Qa On Generalization Error
### Qa) On Generalization Error
Write a detailed description of figure 5.3 (above) for your hand-in.
All concepts in the figure must be explained
* training/generalization error,
* underfit/overfit zone,
* optimal capacity,
* generalization gab,
* and the two axes: x/capacity, y/error.
%% Cell type:code id: tags:
``` python
# TODO: text
assert False, "TODO: write some text.."
%% Cell type:markdown id: tags:
### Qb A MSE-Epoch/Error Plot
Next, we look at a SGD model for fitting polynomial, that is _polynomial regression_ similar to what Géron describes in [HOML] ("Polynomial Regression" + "Learning Curves").
Review the code below for plotting the RMSE vs. the iteration number or epoch below (three cells, part I/II/III).
Write a short description of the code, and comment on the important points in the generation of the (R)MSE array.
The training phase output lots of lines like
> `epoch= 104, mse_train=1.50, mse_val=2.37` <br>
> `epoch= 105, mse_train=1.49, mse_val=2.35`
What is an ___epoch___ and what is `mse_train` and `mse_val`?
NOTE$_1$: the generalization plot figure 5.3 in [DL] (above) and the plots below have different x-axis, and are not to be compared directly!
NOTE$_2$: notice that a 90 degree polynomial is used for the polynomial regression. This is just to produce a model with an extremly high capacity.
%% Cell type:code id: tags:
``` python
# Run code: Qb(part I)
# NOTE: modified code from [GITHOML], 04_training_linear_models.ipynb
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import SGDRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
def GenerateData():
m = 100
X = 6 * np.random.rand(m, 1) - 3
y = 2 + X + 0.5 * X**2 + np.random.randn(m, 1)
return X, y
X, y = GenerateData()
X_train, X_val, y_train, y_val = \
train_test_split( \
X[:50], y[:50].ravel(), \
test_size=0.5, \
print("X_val .shape=",X_val.shape)
print("y_val .shape=",y_val.shape)
poly_scaler = Pipeline([
("poly_features", PolynomialFeatures(degree=90, include_bias=False)),
("std_scaler", StandardScaler()),
X_train_poly_scaled = poly_scaler.fit_transform(X_train)
X_val_poly_scaled = poly_scaler.transform(X_val)
X_new=np.linspace(-3, 3, 100).reshape(100, 1)
plt.plot(X, y, "b.", label="All X-y Data")
plt.xlabel("$x_1$", fontsize=18, )
plt.ylabel("$y$", rotation=0, fontsize=18)
plt.legend(loc="upper left", fontsize=14)
plt.axis([-3, 3, 0, 10])
%% Cell type:code id: tags:
``` python
# Run code: Qb(part II)
def Train(X_train, y_train, X_val, y_val, n_epochs, verbose=False):
train_errors, val_errors = [], []
sgd_reg = SGDRegressor(max_iter=1,
for epoch in range(n_epochs):, y_train)
y_train_predict = sgd_reg.predict(X_train)
y_val_predict = sgd_reg.predict(X_val)
mse_train=mean_squared_error(y_train, y_train_predict)
mse_val =mean_squared_error(y_val , y_val_predict)
val_errors .append(mse_val)
if verbose:
print(f" epoch={epoch:4d}, mse_train={mse_train:4.2f}, mse_val={mse_val:4.2f}")
return train_errors, val_errors
n_epochs = 500
train_errors, val_errors = Train(X_train_poly_scaled, y_train, X_val_poly_scaled, y_val, n_epochs, True)
%% Cell type:code id: tags:
``` python
# Run code: Qb(part III)
best_epoch = np.argmin(val_errors)
best_val_rmse = np.sqrt(val_errors[best_epoch])
plt.annotate('Best model',
xy=(best_epoch, best_val_rmse),
xytext=(best_epoch, best_val_rmse + 1),
arrowprops=dict(facecolor='black', shrink=0.05),
best_val_rmse -= 0.03 # just to make the graph look better
plt.plot([0, n_epochs], [best_val_rmse, best_val_rmse], "k:", linewidth=2)
plt.plot(np.sqrt(train_errors), "b--", linewidth=2, label="Training set")
plt.plot(np.sqrt(val_errors), "g-", linewidth=3, label="Validation set")
plt.legend(loc="upper right", fontsize=14)
plt.xlabel("Epoch", fontsize=14)
plt.ylabel("RMSE", fontsize=14)
%% Cell type:code id: tags:
``` python
# TODO: code review..
assert False, "TODO: code review in text form"
%% Cell type:markdown id: tags:
### Qc Early Stopping
### Qc) Early Stopping
How would you implement ___early stopping___, in the code above?
Write an explanation of the early stopping concept...that is, just write some pseudo code that 'implements' the early stopping.
OPTIONAL: also implement your early stopping pseudo code in Python, and get it to work with the code above (and not just flipping the hyperparameter to `early_stopping=True` on the `SGDRegressor`).
%% Cell type:code id: tags:
``` python
# TODO: early stopping..
assert False, "TODO: explain early stopping"
%% Cell type:markdown id: tags:
### Qd Explain the Polynomial RMSE-Capacity plot
### Qd) Explain the Polynomial RMSE-Capacity plot
Now we revisit the concepts from `capacity_under_overfitting.ipynb` notebook and the polynomial fitting with a given capacity (polynomial degree).
Peek into the cell below (code similar to what we saw in `capacity_under_overfitting.ipynb`), and explain the generated RMSE-Capacity plot. Why does the _training error_ keep dropping, while the _CV-error_ drops until around capacity 3, and then begin to rise again?
What does the x-axis _Capacity_ and y-axis _RMSE_ represent?
Try increasing the model capacity. What happens when you do plots for `degrees` larger than around 10? Relate this with what you found via Qa+b in `capacity_under_overfitting.ipynb`.
%% Cell type:code id: tags:
``` python
# Run and review this code
# NOTE: modified code from [GITHOML], 04_training_linear_models.ipynb
%matplotlib inline
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error
def true_fun(X):
return np.cos(1.5 * np.pi * X)
def GenerateData():
n_samples = 30
#degrees = [1, 4, 15]
degrees = range(1,8)
X = np.sort(np.random.rand(n_samples))
y = true_fun(X) + np.random.randn(n_samples) * 0.1
return X, y, degrees
X, y, degrees = GenerateData()
capacities, rmses_training, rmses_validation= [], [], []
for i in range(len(degrees)):
polynomial_features = PolynomialFeatures(degree=d, include_bias=False)
linear_regression = LinearRegression()
pipeline = Pipeline([
("polynomial_features", polynomial_features),
("linear_regression", linear_regression)
Z = X[:, np.newaxis], y)
p = pipeline.predict(Z)
train_rms = mean_squared_error(y,p)
# Evaluate the models using crossvalidation
scores = cross_val_score(pipeline, Z, y, scoring="neg_mean_squared_error", cv=10)
score_mean = -scores.mean()
print(f" degree={d:4d}, rmse_training={rmse_training:4.2f}, rmse_cv={rmse_validation:4.2f}")
capacities .append(d)
rmses_training .append(rmse_training)
plt.plot(capacities, rmses_training, "b--", linewidth=2, label="training RMSE")
plt.plot(capacities, rmses_validation,"g-", linewidth=2, label="validation RMSE")
plt.legend(loc="upper right", fontsize=14)
plt.xlabel("Capacity", fontsize=14)
plt.ylabel("RMSE", fontsize=14)
%% Cell type:code id: tags:
``` python
# TODO: investigate..
assert False, "TODO: ...answer in text form"
%% Cell type:markdown id: tags:
---------| |
2018-1219| CEF, initial.
2018-0214| CEF, major update and put in sync with under/overfitting exe.
2018-0220| CEF, fixed revision table malformatting.
2018-0225| CEF, minor text updates, and made Qc optional.
2018-0225| CEF, updated code, made more functions.
2018-0311| CEF, corrected RSME to RMSE.
2019-1008| CEF, updated to ITMAL E19.
2020-0314| CEF, updated to ITMAL F20.
2020-1015| CEF, updated to ITMAL E20.
2020-1117| CEF, added comment on 90 degree polynomial, made early stopping a pseudo code exe.
2021-0322| CEF, changed crossref from "capacity_under_overfitting.ipynb Qc" to Qa+b in QdExplain the Polynomial RMSE-Capacity Plot.
2021-0323| CEF, changed 'cv RMSE' legend to 'validation RMSE'.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment