Commit 8f904bd4 authored by Carsten Eie Frigaard's avatar Carsten Eie Frigaard
Browse files


parent 7a6aad31
%% Cell type:markdown id: tags:
# ITMAL Exercise
## Model capacity and under/overfitting
NOTE: text and code to the exercise it taken from
NOTE: text and code to the exercise taken from
This example demonstrates the problems of underfitting and overfitting and
how we can use linear regression with polynomial features to approximate
nonlinear functions.
The plot below shows the function that we want to approximate,
which is a part of the cosine function. In addition, the samples from the
real function and the approximations of different models are displayed. The
models have polynomial features of different degrees.
We can see that a linear function (polynomial with degree 1) is not sufficient to fit the
training samples. This is called **underfitting**.
A polynomial of degree 4 approximates the true function almost perfectly. However, for higher degrees the model will **overfit** the training data, i.e. it learns the noise of the
training data.
We evaluate quantitatively **overfitting**/**underfitting** by using
cross-validation. We calculate the mean squared error (MSE) on the validation
set, the higher, the less likely the model generalizes correctly from the
training data.
### Qa Explain the polynomial fitting via code review
### Qa) Explain the polynomial fitting via code review
Review the code below, write a __short__ code review summary, and explain how the polynomial fitting is implemented?
NOTE: Do not dig into the plotting details (its unimportant compared to the rest of the code), but just explain the outcome of the plots.
%% Cell type:code id: tags:
``` python
# TODO: code review
#assert False, "TODO: remove me, and review this code"
# NOTE: code from
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
def true_fun(X):
return np.cos(1.5 * np.pi * X)
def GenerateData(n_samples = 30):
X = np.sort(np.random.rand(n_samples))
y = true_fun(X) + np.random.randn(n_samples) * 0.1
return X, y
X, y = GenerateData()
degrees = [1, 4, 15]
plt.figure(figsize=(14, 5))
for i in range(len(degrees)):
ax = plt.subplot(1, len(degrees), i + 1)
plt.setp(ax, xticks=(), yticks=())
polynomial_features = PolynomialFeatures(degree=degrees[i], include_bias=False)
linear_regression = LinearRegression()
pipeline = Pipeline([
("polynomial_features", polynomial_features),
("linear_regression", linear_regression)
])[:, np.newaxis], y)
# Evaluate the models using crossvalidation
scores = cross_val_score(pipeline, X[:, np.newaxis], y, scoring="neg_mean_squared_error", cv=10)
score_mean = scores.mean()
print(f" degree={degrees[i]:4d}, score_mean={score_mean:4.2f}, {polynomial_features}")
X_test = np.linspace(0, 1, 100)
y_pred = pipeline.predict(X_test[:, np.newaxis])
# Plotting details
plt.plot(X_test, y_pred , label="Model")
plt.plot(X_test, true_fun(X_test), label="True function")
plt.scatter(X, y, edgecolor='b', s=20, label="Samples")
plt.xlim((0, 1))
plt.ylim((-2, 2))
plt.title("Degree {}\nScore(-MSE) = {:.2e}(+/- {:.2e})".format(degrees[i], scores.mean(), scores.std()))
# CEF: loop added, prints each score per CV-fold.
# NOTICE the sub-means when degree=15!
print(f" CV sub-scores: mean = {scores.mean():.2}, std = {scores.std():.2}")
for i in range(len(scores)):
print(f" CV fold {i} => score = {scores[i]:.2}")
%% Cell type:code id: tags:
``` python
# TODO: code review..
assert False, "TODO: review in text"
%% Cell type:markdown id: tags:
### Qb Explain the capacity and under/overfitting concept
### Qb) Explain the capacity and under/overfitting concept
Write a textual description of the capacity and under/overfitting concept using the plots in the code above.
What happens when the polynomial degree is low/medium/high with respect to under/overfitting concepts? Explain in details.
%% Cell type:code id: tags:
``` python
# TODO: plot explainations..
assert False, "TODO: text"
%% Cell type:markdown id: tags:
### Qc Score method
### Qc) Score method
Why is the scoring method called `neg_mean_squared_error` in the code?
Explain why we see a well known $J$-function, the $MSE$, is conceptually moving from being a cost-function to now be a score function, how can that be?
What happens if you try to set it to `mean_squared_error`, i.e. does it work or does it raise an exception, ala
scores = cross_val_score(pipeline, X[:, np.newaxis], y, scoring="mean_squared_error", cv=10)
Remember to document the outcome for Your journal.
What is the theoretical minimum and maximum score values (remember that the score range was $[-\infty;1]$ for the $r^2$ score). Why does the Degree 15 model have a `Score(-MSE) = -1.8E8`? And, why is this by no means the best model?
More on Score funs at
%% Cell type:code id: tags:
``` python
# TODO: examine the score method..
assert False, "TODO: explain and test the neg_mean_squared_error in the code above"
%% Cell type:markdown id: tags:
---------| |
2018-1218| CEF, initial.
2018-0214| CEF, major update.
2018-0220| CEF, added code reference.
2018-0220| CEF, fixed revision table malformatting.
2018-0225| CEF, minor text updates, and made Qc optional.
2019-1008| CEF, updated to ITMAL E19.
2020-0314| CEF, updated to ITMAL F20.
2020-1015| CEF, updated to ITMAL E20.
2020-1029| CEF, changed sign of score(-MSE) for score=neg_mean_squared_error.
2020-1004| CEF, update to ITMAL E21.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment