Commit 901823c1 by Carsten Eie Frigaard

### update

parent f33844cc

463 KB

 %% Cell type:markdown id: tags: # ITMAL Exercise ## Generalization Error In this exercise, we need to explain all important overall concepts in training. Let's begin with Figure 5.3 from Deep Learning, Ian Goodfellow, et. al. [DL], that pretty much sums it all up ### Qa) On Generalization Error Write a detailed description of figure 5.3 (above) for your hand-in. All concepts in the figure must be explained * training/generalization error, * underfit/overfit zone, * optimal capacity, * generalization gab, * and the two axes: x/capacity, y/error. %% Cell type:code id: tags: ``` python # TODO: ...in text assert False, "TODO: write some text.." ``` %% Cell type:markdown id: tags: ### Qb A MSE-Epoch/Error Plot Next, we look at a SGD model for fitting polynomial, that is _polynomial regression_ similar to what Géron describes in [HOML] ("Polynomial Regression" + "Learning Curves"). Review the code below for plotting the RMSE vs. the iteration number or epoch below (three cells, part I/II/III). Write a short description of the code, and comment on the important points in the generation of the (R)MSE array. The training phase output lots of lines like > `epoch= 104, mse_train=1.50, mse_val=2.37`
> `epoch= 105, mse_train=1.49, mse_val=2.35` What is an ___epoch___ and what is `mse_train` and `mse_val`? NOTE\$_1\$: the generalization plot figure 5.3 in [DL] (above) and the plots below have different x-axis, and are not to be compared directly! NOTE\$_2\$: notice that a 90 degree polynomial is used for the polynomial regression. This is just to produce a model with an extremly high capacity. %% Cell type:code id: tags: ``` python # Run code: Qb(part I) # NOTE: modified code from [GITHOML], 04_training_linear_models.ipynb %matplotlib inline import matplotlib import matplotlib.pyplot as plt import numpy as np from sklearn.preprocessing import PolynomialFeatures, StandardScaler from sklearn.pipeline import Pipeline from sklearn.linear_model import SGDRegressor from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error np.random.seed(42) def GenerateData(): m = 100 X = 6 * np.random.rand(m, 1) - 3 y = 2 + X + 0.5 * X**2 + np.random.randn(m, 1) return X, y X, y = GenerateData() X_train, X_val, y_train, y_val = \ train_test_split( \ X[:50], y[:50].ravel(), \ test_size=0.5, \ random_state=10) print("X_train.shape=",X_train.shape) print("X_val .shape=",X_val.shape) print("y_train.shape=",y_train.shape) print("y_val .shape=",y_val.shape) poly_scaler = Pipeline([ ("poly_features", PolynomialFeatures(degree=90, include_bias=False)), ("std_scaler", StandardScaler()), ]) X_train_poly_scaled = poly_scaler.fit_transform(X_train) X_val_poly_scaled = poly_scaler.transform(X_val) X_new=np.linspace(-3, 3, 100).reshape(100, 1) plt.plot(X, y, "b.", label="All X-y Data") plt.xlabel("\$x_1\$", fontsize=18, ) plt.ylabel("\$y\$", rotation=0, fontsize=18) plt.legend(loc="upper left", fontsize=14) plt.axis([-3, 3, 0, 10]) plt.show() print('OK') ``` %% Cell type:code id: tags: ``` python # Run code: Qb(part II) def Train(X_train, y_train, X_val, y_val, n_epochs, verbose=False): print("Training...n_epochs=",n_epochs) train_errors, val_errors = [], [] sgd_reg = SGDRegressor(max_iter=1, penalty=None, eta0=0.0005, warm_start=True, early_stopping=False, learning_rate="constant", tol=-float("inf"), random_state=42) for epoch in range(n_epochs): sgd_reg.fit(X_train, y_train) y_train_predict = sgd_reg.predict(X_train) y_val_predict = sgd_reg.predict(X_val) mse_train=mean_squared_error(y_train, y_train_predict) mse_val =mean_squared_error(y_val , y_val_predict) train_errors.append(mse_train) val_errors .append(mse_val) if verbose: print(f" epoch={epoch:4d}, mse_train={mse_train:4.2f}, mse_val={mse_val:4.2f}") return train_errors, val_errors n_epochs = 500 train_errors, val_errors = Train(X_train_poly_scaled, y_train, X_val_poly_scaled, y_val, n_epochs, True) print('OK') ``` %% Cell type:code id: tags: ``` python # Run code: Qb(part III) best_epoch = np.argmin(val_errors) best_val_rmse = np.sqrt(val_errors[best_epoch]) plt.figure(figsize=(10,5)) plt.annotate('Best model', xy=(best_epoch, best_val_rmse), xytext=(best_epoch, best_val_rmse + 1), ha="center", arrowprops=dict(facecolor='black', shrink=0.05), fontsize=16, ) best_val_rmse -= 0.03 # just to make the graph look better plt.plot([0, n_epochs], [best_val_rmse, best_val_rmse], "k:", linewidth=2) plt.plot(np.sqrt(train_errors), "b--", linewidth=2, label="Training set") plt.plot(np.sqrt(val_errors), "g-", linewidth=3, label="Validation set") plt.legend(loc="upper right", fontsize=14) plt.xlabel("Epoch", fontsize=14) plt.ylabel("RMSE", fontsize=14) plt.show() ``` %% Cell type:code id: tags: ``` python # TODO: code review.. assert False, "TODO: code review in text form" ``` %% Cell type:markdown id: tags: ### Qc) Early Stopping How would you implement ___early stopping___, in the code above? Write an explanation of the early stopping concept...that is, just write some pseudo code that 'implements' the early stopping. OPTIONAL: also implement your early stopping pseudo code in Python, and get it to work with the code above (and not just flipping the hyperparameter to `early_stopping=True` on the `SGDRegressor`). %% Cell type:code id: tags: ``` python # TODO: early stopping.. assert False, "TODO: explain early stopping" ``` %% Cell type:markdown id: tags: ### Qd) Explain the Polynomial RMSE-Capacity plot Now we revisit the concepts from `capacity_under_overfitting.ipynb` notebook and the polynomial fitting with a given capacity (polynomial degree). Peek into the cell below (code similar to what we saw in `capacity_under_overfitting.ipynb`), and explain the generated RMSE-Capacity plot. Why does the _training error_ keep dropping, while the _CV-error_ drops until around capacity 3, and then begin to rise again? What does the x-axis _Capacity_ and y-axis _RMSE_ represent? Try increasing the model capacity. What happens when you do plots for `degrees` larger than around 10? Relate this with what you found via Qa+b in `capacity_under_overfitting.ipynb`. %% Cell type:code id: tags: ``` python # Run and review this code # NOTE: modified code from [GITHOML], 04_training_linear_models.ipynb %matplotlib inline from math import sqrt import numpy as np import matplotlib.pyplot as plt from sklearn.pipeline import Pipeline from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression from sklearn.model_selection import cross_val_score from sklearn.metrics import mean_squared_error def true_fun(X): return np.cos(1.5 * np.pi * X) def GenerateData(): n_samples = 30 #degrees = [1, 4, 15] degrees = range(1,8) X = np.sort(np.random.rand(n_samples)) y = true_fun(X) + np.random.randn(n_samples) * 0.1 return X, y, degrees np.random.seed(0) X, y, degrees = GenerateData() print("Iterating...degrees=",degrees) capacities, rmses_training, rmses_validation= [], [], [] for i in range(len(degrees)): d=degrees[i] polynomial_features = PolynomialFeatures(degree=d, include_bias=False) linear_regression = LinearRegression() pipeline = Pipeline([ ("polynomial_features", polynomial_features), ("linear_regression", linear_regression) ]) Z = X[:, np.newaxis] pipeline.fit(Z, y) p = pipeline.predict(Z) train_rms = mean_squared_error(y,p) # Evaluate the models using crossvalidation scores = cross_val_score(pipeline, Z, y, scoring="neg_mean_squared_error", cv=10) score_mean = -scores.mean() rmse_training=sqrt(train_rms) rmse_validation=sqrt(score_mean) print(f" degree={d:4d}, rmse_training={rmse_training:4.2f}, rmse_cv={rmse_validation:4.2f}") capacities .append(d) rmses_training .append(rmse_training) rmses_validation.append(rmse_validation) plt.figure(figsize=(7,4)) plt.plot(capacities, rmses_training, "b--", linewidth=2, label="training RMSE") plt.plot(capacities, rmses_validation,"g-", linewidth=2, label="validation RMSE") plt.legend(loc="upper right", fontsize=14) plt.xlabel("Capacity", fontsize=14) plt.ylabel("RMSE", fontsize=14) plt.show() print('OK') ``` %% Cell type:code id: tags: ``` python # TODO: investigate.. assert False, "TODO: ...answer in text form" ``` %% Cell type:markdown id: tags: REVISIONS| | ---------| | 2018-1219| CEF, initial. 2018-0214| CEF, major update and put in sync with under/overfitting exe. 2018-0220| CEF, fixed revision table malformatting. 2018-0225| CEF, minor text updates, and made Qc optional. 2018-0225| CEF, updated code, made more functions. 2018-0311| CEF, corrected RSME to RMSE. 2019-1008| CEF, updated to ITMAL E19. 2020-0314| CEF, updated to ITMAL F20. 2020-1015| CEF, updated to ITMAL E20. 2020-1117| CEF, added comment on 90 degree polynomial, made early stopping a pseudo code exe. 2021-0322| CEF, changed crossref from "capacity_under_overfitting.ipynb Qc" to Qa+b in QdExplain the Polynomial RMSE-Capacity Plot. 2021-0323| CEF, changed 'cv RMSE' legend to 'validation RMSE'. 2021-1031| CEF, updated to ITMAL E21. ... ...
File moved
 %% Cell type:markdown id: tags: # ITMAL Exercise ## Hyperparameters and Gridsearch When instantiating a Scikit-learn model in python most or all constructor parameters have _default_ values. These values are not part of the internal model and are hence called ___hyperparametes___---in contrast to _normal_ model parameters, for example the neuron weights, \$\mathbf w\$, for an `MLP` model. When instantiating a Scikit-learn model in python most or all constructor parameters have _default_ values. These values are not part of the internal model and are hence called ___hyperparameters___---in contrast to _normal_ model parameters, for example the neuron weights, \$\mathbf w\$, for an `MLP` model. ### Manual Tuning Hyperparameters Below is an example of the python constructor for the support-vector classifier `sklearn.svm.SVC`, with say the `kernel` hyperparameter having the default value `'rbf'`. If you should choose, what would you set it to other than `'rbf'`? ```python class sklearn.svm.SVC( C=1.0, kernel=’rbf’, degree=3, gamma=’auto_deprecated’, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=’ovr’, random_state=None ) ``` The default values might be a sensible general starting point, but for your data, you might want to optimize the hyperparameters to yield a better result. To be able to set `kernel` to a sensible value you need to go into the documentation for the `SVC` and understand what the kernel parameter represents, and what values it can be set to, and you need to understand the consequences of setting `kernel` to something different than the default...and the story repeats for every other hyperparameter! ### Brute Force Search An alternative to this structured, but time-consuming approach, is just to __brute-force__ a search of interesting hyperparameters, an choosing the 'best' parameters according to a fit-predict and some score, say 'f1'. An alternative to this structured, but time-consuming approach, is just to __brute-force__ a search of interesting hyperparameters, and choose the 'best' parameters according to a fit-predict and some score, say 'f1'.
Conceptual graphical view of grid search for two distinct hyperparameters.
Notice that you would normally search hyperparameters like `alpha` with an exponential range, say [0.01, 0.1, 1, 10] or similar.