This example demonstrates the problems of underfitting and overfitting and
how we can use linear regression with polynomial features to approximate
nonlinear functions.
The plot below shows the function that we want to approximate,
which is a part of the cosine function. In addition, the samples from the
real function and the approximations of different models are displayed. The
models have polynomial features of different degrees.
We can see that a linear function (polynomial with degree 1) is not sufficient to fit the
training samples. This is called **underfitting**.
A polynomial of degree 4 approximates the true function almost perfectly. However, for higher degrees the model will **overfit** the training data, i.e. it learns the noise of the
training data.
We evaluate quantitatively **overfitting**/**underfitting** by using
cross-validation. We calculate the mean squared error (MSE) on the validation
set, the higher, the less likely the model generalizes correctly from the
training data.
### Qa Explain the polynomial fitting via code review
### Qa) Explain the polynomial fitting via code review
Review the code below, write a __short__ code review summary, and explain how the polynomial fitting is implemented?
NOTE: Do not dig into the plotting details (its unimportant compared to the rest of the code), but just explain the outcome of the plots.
%% Cell type:code id: tags:
``` python
# TODO: code review
#assert False, "TODO: remove me, and review this code"
# NOTE: code from https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html
print(f" CV sub-scores: mean = {scores.mean():.2}, std = {scores.std():.2}")
foriinrange(len(scores)):
print(f" CV fold {i} => score = {scores[i]:.2}")
plt.show()
print('OK')
```
%% Cell type:code id: tags:
``` python
# TODO: code review..
assertFalse,"TODO: review in text"
```
%% Cell type:markdown id: tags:
### Qb Explain the capacity and under/overfitting concept
### Qb) Explain the capacity and under/overfitting concept
Write a textual description of the capacity and under/overfitting concept using the plots in the code above.
What happens when the polynomial degree is low/medium/high with respect to under/overfitting concepts? Explain in details.
%% Cell type:code id: tags:
``` python
# TODO: plot explainations..
assertFalse,"TODO: answer...in text"
```
%% Cell type:markdown id: tags:
### Qc Score method
### Qc) Score method
Why is the scoring method called `neg_mean_squared_error` in the code?
Explain why we see a well known $J$-function, the $MSE$, is conceptually moving from being a cost-function to now be a score function, how can that be?
What happens if you try to set it to `mean_squared_error`, i.e. does it work or does it raise an exception, ala
Remember to document the outcome for Your journal.
What is the theoretical minimum and maximum score values (remember that the score range was $[-\infty;1]$ for the $r^2$ score). Why does the Degree 15 model have a `Score(-MSE) = -1.8E8`? And, why is this by no means the best model?