Commit f787be98 authored by Carsten Eie Frigaard's avatar Carsten Eie Frigaard
Browse files

pre_l08_update

parent 16cd2d1d
%% Cell type:markdown id: tags:
# ITMAL Demo
## Installing Keras
REVISIONS| |
---------| |
2018-03-25| CEF, initial.
2020-03-05| CEF, F20 ITMAL update.
2020-03-06| CEF, investigated Anaconda 2019.10 on Windows and updated GPU server notes.
2021-10-12| CEF, updated for ITMAL E21.
2021-03-23| CEF, updated for SWMAL F22.
2021-03-23| CEF, updated for SWMAL F22, rewrote install method for Keras via environments.
### WARNING for SWMAL F22
### Installing Keras and Tensorflow for Anaconda 2021.11
Keras will not install under Anaconda. It ends up in a endless package-conflict, when installing under `conda`.
Keras will not install under Anaconda version 2021.11. It ends up in a endless package-conflict, when installing under `conda`.
Fix is underway..
(This is a new finding for the particular version 2021.11, and previous version did not have this tensorflow install problem...but typically only a Keras install problem).
The root-cause of the problem must be missing check when packageing the Anaconda, failing to find the set of conflicts we see, when installing `tensorflow` in the set of default installe packages that comes with the Anaconda distribution.
### Install Keras via Anaconda Prompt
#### 1) Prepare and Ceate a new Environment
1: Launch the __anaconda prompt__ console (CMD), via the Start menu
So, one solution is to create a new conda environment, and from this install `scikit-learn` and `tensorflow` and `keras`,
<img src="https://itundervisning.ase.au.dk/SWMAL/L06/Figs/Screenshot_anaconda_prompt.png" alt="WARNING: could not get image from server." style="width:200px">
2: list installed packages via
Later we need a package called `nb_conda_kernels`, let us install this before we create and activate the new enviroment
```bash
> conda list
> conda install nb_conda_kernels
```
in the anaconda console.
<img src="https://itundervisning.ase.au.dk/SWMAL/L06/Figs/Screenshot_anaconda_prompt_install_0.png" alt="WARNING: could not get image from server." style="width:700px">
3: install keras via
Now, let us call our enviroment `swmal` and create it by running
```bash
> conda install keras
> conda create --name swmal
```
and WAIT for 1 to 30 min before the spin-progress bar finish (a problem that makes `conda` extreme slow in the latest two releases of anaconda!).
<img src="https://itundervisning.ase.au.dk/SWMAL/L06/Figs/Screenshot_anaconda_prompt_install_1.png" alt="WARNING: could not get image from server." style="width:700px">
After install, you can see the Keras and Tensorflow version via ```conda list keras``` and ```conda list tensorflow```, but notice that you might also want to install the GPU version of Tensoflow, if our PC has a suitable GPU (need CUDA support). Below I did not install the GPU version seen by the call ```conda list tensorflow-gpu```
<img src="https://itundervisning.ase.au.dk/SWTMAL/L06/Figs/Screenshot_anaconda_prompt_install_2.png" alt="WARNING: could not get image from server." style="width:700px">
4: if it downgrades your Scikit-learn (use version function in the cell below), then try removing keras and/or tensorflow and reinstall
and then activate it via
```bash
> conda remove keras tensorflow
> conda activate swmal
```
```bash
> conda install keras tensorflow
```
#### 2) Install Needed Packages
or perhaps try installing from conda-forge
Now we got a clean state enviroment and we need to install the packages needed for `scikit-learn` and `tensorflow`, but this is as easy as
```bash
> conda install scikit-learn tensorflow keras nb_conda_kernels
```
conda install -c conda-forge tensorflow keras
```
5: if everything fails: use the ASE GPU cluster or use keras in TensorFlow ala
```python
import tensorflow as tf
mnist = tf.keras.datasets.mnist.load_data()
```
My local installation has the following version setup (yours may vary)
Initial:
```python
Python version: 3.8.5.
Scikit-learn version: 0.23.2.
WARN: could not find keras!
WARN: could not find tensorflow!
WARN: could not find tensorflow.keras!
```
and after installing Keras (and hence implicitly TensorFlow) on Windows
```python
Python version: 3.8.5.
Scikit-learn version: 0.23.2.
Keras version: 2.4.3
Tensorflow version: 2.2.0
Tensorflow.keras version: 2.3.0-tf
Opencv2 version: 4.5.1
```
### Alternative 1: Installing Keras via Tensorflow
If Keras and TensorFlow (TF) start a battle-of-versions (Keras wants one version TF another, it happens frequently) you could also go for just use the Keras already found in TF.
(Notice that the package `nb_conda_kernels` also needs to be installed in this new enviroment.)
So, yes, there is already a Keras in the TF interface that can be used directly as
#### 3) Launching Jupyter-Notebooks in the new Enviroment
```tf.keras.(some modules or functions)```
There are two `keras` interfaces, the stand-alone `keras.` and then a similar interface already build into `tensorflow.keras`, you could use both, but direct access via the stand-alone inteface may be the most obvious.
instead of direct keras interface calls via
```keras.(some modules or functions)```
#### 4) Testing the New Enviroment Setup
### Alternative 2: Install Keras via Anaconda GUI
If you dislike the Anacoda prompt and prefer a GUI, then launch the Anaconda Navigator, go to the Environment tab and enter 'keras' in the search prompt
<img src="https://itundervisning.ase.au.dk/SWMAL/L06/Figs/Screenshot_anaconda_prompt_install_3.png" alt="WARNING: could not get image from server." style="width:700px">
Lets see the version installed in the new `swmal` environment via the `Versions()` function found in the `itmallib`
%% Cell type:code id: tags:
``` python
# DEMO of Versions in libitmal
from libitmal import versions as itmalversions
itmalversions.Versions()
from libitmal import versions
versions.Versions()
```
%% Output
Python version: 3.9.7.
Scikit-learn version: 0.24.2.
WARN: could not find keras!
WARN: could not find tensorflow!
WARN: could not find tensorflow.keras!
Scikit-learn version: 1.0.2.
Keras version: 2.6.0
Tensorflow version: 2.6.0
Tensorflow.keras version: 2.6.0
Opencv2 version: 4.5.5
%% Cell type:markdown id: tags:
# Using the ASE GPU Cluster
The `Versions()` function should print whatever version you installed, or produce a warning, if the package is not installed at all.
__NOTE: this section is currently slighty outdated!__
For my current Windows/Anaconda setup I got the versions, Yours may differ slightly
### Client GPU Support
If your own computer has a CUDA-compatible GPU you might also want to install TensorFlow for the GPU
```
conda install tensorflow-gpu
Python version: 3.9.7.
Scikit-learn version: 1.0.2.
Keras version: 2.6.0
Tensorflow version: 2.6.0
Tensorflow.keras version: 2.6.0
```
### Server GPU support
You also have an ITMAL group account on our GPU Cluster server at
#### 5) Wrapping It All Up in a BAT File
* http://gpucluster.st.lab.au.dk/
To make development easy, a BAT (Windows batch or script file) shold be created. This should ease the launch of Jupyter-Notebooks and the BAT file could be put in a icon placed on the taskbar or similar.
Find login details etc. in Blackboard ("Kursusinfo | GPU Cluster"):
* https://brightspace.au.dk/d2l/le/lessons/27524/topics/296678
Current GPU-Cluster version setup is (??)
```python
Python version: 3.6.8.
Scikit-learn version: 0.20.3.
Keras version: 2.2.4
Tensorflow version: 1.12.0
```
The BAT-file should contain the followin text lines, an notice that you must changex
### Issues regarding the Server GPU Memory
For all users, I've added a startup-script when you log into the GPU server. The startup-script is found in
* /home/shared/00_init.py
and among other things, add your home-folder to the PYTHON path.
When running on the GPU-server you are automatically assigned 10% of the GPU memory. This is also done via the startup-script, and you are allowed to increase you GPU memory fraction if needed by calling the Enable GPU function in ```/home/shared/00_init.py``` (or the module ```kernelfun``` in ``libitmal```) like
```
StartupSequence_GPU(verbose=True)
```
ECHO OFF
or
REM my-jupyter-notebook
REM Version: 0.1
REM 2022-03-23: CEF, inital version
and thereby allocating 30% GPU mem.
echo MY-JUPYTER-NOTEBOOK launcher..
NOTE 1: processes using more than 50% GPU memory will automatically be killed with an interval of about 5 min, and python kernels running for more than about a week will also be terminated automatically.
SET USER=au204573
NOTE 2: most Scikit-learn ML algorithms (if not all) do NOT use the GPU at all. You need to move to Tensorflow/Keras to get true GPU hardware support.
@CALL "%C:\Users\%USER%\Anaconda3\condabin\conda.bat" activate swmal %*
NOTE 3: notebooks will keep running on the server, even if you shut-down you web connection to it. Print output will hence be lost, but you can still start long running model training on the server, and come back later to see if its finished...(on the same Node).
REM note book start in this directory, you may change it:
cd \
NOTE 4: If you need to stop you server: use the "Control Panel (upper right) | Stop my server" to shut down all your kernels and release all memory.
jupyter-notebook
%% Cell type:code id: tags:
``` python
# DEMO of set GPU memory fraction in libitmal
from libitmal import kernelfuns as itmalkernefuns
itmalkernefuns.StartupSequence_GPU(verbose=True)
# See kernel running, but works only if you got CUDA installed
! nvidia-smi
```
%% Cell type:markdown id: tags:
### GPU Server GIT/PYTHONPATH setup
On the GPU server You can clone the git repository inside the Jupyternotbook via
```bash
! git clone https://gitlab.au.dk/au204573/GITMAL.git
! cd GITMAL && git pull
echo DONE
```
The `PYTHONPATH` environment should already point to you home folder via the startup-script (described above).
......
ECHO OFF
REM my-jupyter-notebook
REM Version: 0.1
REM 2022-03-23: CEF, inital version
echo MY-JUPYTER-NOTEBOOK launcher..
@CALL "%HOMEPATH%\Anaconda3\condabin\conda.bat" activate swmal %* <NUL
REM notebooks start in this directory, you may change it..
cd \
REM then launch the notebook..
jupyter-notebook
echo DONE
\ No newline at end of file
%% Cell type:markdown id: tags:
# SWMAL Exercise
## Model capacity and under/overfitting
NOTE: text and code to the exercise taken from
* https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html
This example demonstrates the problems of underfitting and overfitting and
how we can use linear regression with polynomial features to approximate
nonlinear functions.
The plot below shows the function that we want to approximate,
which is a part of the cosine function. In addition, the samples from the
real function and the approximations of different models are displayed. The
models have polynomial features of different degrees.
We can see that a linear function (polynomial with degree 1) is not sufficient to fit the
training samples. This is called **underfitting**.
A polynomial of degree 4 approximates the true function almost perfectly. However, for higher degrees the model will **overfit** the training data, i.e. it learns the noise of the
training data.
We evaluate quantitatively **overfitting**/**underfitting** by using
cross-validation. We calculate the mean squared error (MSE) on the validation
set, the higher, the less likely the model generalizes correctly from the
training data.
### Qa) Explain the polynomial fitting via code review
Review the code below, write a __short__ code review summary, and explain how the polynomial fitting is implemented?
NOTE: Do not dig into the plotting details (its unimportant compared to the rest of the code), but just explain the outcome of the plots.
%% Cell type:code id: tags:
``` python
# TODO: code review
#assert False, "TODO: remove me, and review this code"
# NOTE: code from https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
def true_fun(X):
return np.cos(1.5 * np.pi * X)
def GenerateData(n_samples = 30):
X = np.sort(np.random.rand(n_samples))
y = true_fun(X) + np.random.randn(n_samples) * 0.1
return X, y
np.random.seed(0)
X, y = GenerateData()
degrees = [1, 4, 15]
print("Iterating...degrees=",degrees)
plt.figure(figsize=(14, 5))
for i in range(len(degrees)):
ax = plt.subplot(1, len(degrees), i + 1)
plt.setp(ax, xticks=(), yticks=())
polynomial_features = PolynomialFeatures(degree=degrees[i], include_bias=False)
linear_regression = LinearRegression()
pipeline = Pipeline([
("polynomial_features", polynomial_features),
("linear_regression", linear_regression)
])
pipeline.fit(X[:, np.newaxis], y)
# Evaluate the models using crossvalidation
scores = cross_val_score(pipeline, X[:, np.newaxis], y, scoring="neg_mean_squared_error", cv=10)
score_mean = scores.mean()
print(f" degree={degrees[i]:4d}, score_mean={score_mean:4.2f}, {polynomial_features}")
X_test = np.linspace(0, 1, 100)
y_pred = pipeline.predict(X_test[:, np.newaxis])
# Plotting details
plt.plot(X_test, y_pred , label="Model")
plt.plot(X_test, true_fun(X_test), label="True function")
plt.scatter(X, y, edgecolor='b', s=20, label="Samples")
plt.xlabel("x")
plt.ylabel("y")
plt.xlim((0, 1))
plt.ylim((-2, 2))
plt.legend(loc="best")
plt.title("Degree {}\nScore(-MSE) = {:.2e}(+/- {:.2e})".format(degrees[i], scores.mean(), scores.std()))
# CEF: loop added, prints each score per CV-fold.
# NOTICE the sub-means when degree=15!
print(f" CV sub-scores: mean = {scores.mean():.2}, std = {scores.std():.2}")
for i in range(len(scores)):
print(f" CV fold {i} => score = {scores[i]:.2}")
plt.show()
print('OK')
```
%% Cell type:code id: tags:
``` python
# TODO: code review..
assert False, "TODO: review in text"
```
%% Cell type:markdown id: tags:
### Qb) Explain the capacity and under/overfitting concept
Write a textual description of the capacity and under/overfitting concept using the plots in the code above.
What happens when the polynomial degree is low/medium/high with respect to under/overfitting concepts? Explain in details.
%% Cell type:code id: tags:
``` python
# TODO: plot explainations..
assert False, "TODO: answer...in text"
```
%% Cell type:markdown id: tags:
### Qc) Score method
Why is the scoring method called `neg_mean_squared_error` in the code?
Explain why we see a well known $J$-function, the $MSE$, is conceptually moving from being a cost-function to now be a score function, how can that be?
What happens if you try to set it to `mean_squared_error`, i.e. does it work or does it raise an exception, ala
```python
scores = cross_val_score(pipeline, X[:, np.newaxis], y, scoring="mean_squared_error", cv=10)
```
Remember to document the outcome for Your journal.
What is the theoretical minimum and maximum score values (remember that the score range was $[-\infty;1]$ for the $r^2$ score). Why does the Degree 15 model have a `Score(-MSE) = -1.8E8`? And, why is this by no means the best model?
More on Score funs at
* https://scikit-learn.org/stable/modules/model_evaluation.html
%% Cell type:code id: tags:
``` python
# TODO: examine the score method..
assert False, "TODO: explain and test the neg_mean_squared_error in the code above"
```
%% Cell type:markdown id: tags:
REVISIONS||
---------||
2018-12-18| CEF, initial.
2018-02-14| CEF, major update.
2018-02-20| CEF, added code reference.
2018-02-20| CEF, fixed revision table malformatting.
2018-02-25| CEF, minor text updates, and made Qc optional.
2019-10-08| CEF, updated to ITMAL E19.
2020-03-14| CEF, updated to ITMAL F20.
2020-10-15| CEF, updated to ITMAL E20.
2020-10-29| CEF, changed sign of score(-MSE) for score=neg_mean_squared_error.
2020-10-04| CEF, update to ITMAL E21.
2021-10-29| CEF, changed sign of score(-MSE) for score=neg_mean_squared_error.
2021-10-04| CEF, update to ITMAL E21.
2022-03-25| CEF, updated to SWMAL F22.
......