Commit 7f19f18f authored by Christian Marius Lillelund's avatar Christian Marius Lillelund
Browse files

made a data loader for complete and fall

parent 28ea7586
Pipeline #43481 passed with stage
in 2 minutes and 47 seconds
Complete case using 5 clfs and 9 scalers
Results for MLP, None:
Accuracy: 0.597
Precision: 0.589
Recall: 0.431
Results for MLP, Standard:
Accuracy: 0.729
Precision: 0.725
Recall: 0.682
Results for MLP, MinMax:
Accuracy: 0.766
Precision: 0.769
Recall: 0.743
Results for MLP, MinMaxRange:
Accuracy: 0.76
Precision: 0.79
Recall: 0.707
Results for MLP, Robust:
Accuracy: 0.767
Precision: 0.744
Recall: 0.755
Results for MLP, MaxAbs:
Accuracy: 0.77
Precision: 0.784
Recall: 0.735
Results for MLP, QuantileTransformer:
Accuracy: 0.764
Precision: 0.787
Recall: 0.735
Results for MLP, QuantileTransformerNorm:
Accuracy: 0.759
Precision: 0.782
Recall: 0.712
########################################
Results for SVM, None:
Accuracy: 0.534
Precision: 0.519
Recall: 0.294
Results for SVM, Standard:
Accuracy: 0.683
Precision: 0.684
Recall: 0.63
Results for SVM, MinMax:
Accuracy: 0.776
Precision: 0.779
Recall: 0.743
Results for SVM, MinMaxRange:
Accuracy: 0.741
Precision: 0.749
Recall: 0.696
Results for SVM, Robust:
Accuracy: 0.717
Precision: 0.72
Recall: 0.664
Results for SVM, MaxAbs:
Accuracy: 0.776
Precision: 0.778
Recall: 0.745
Results for SVM, QuantileTransformer:
Accuracy: 0.774
Precision: 0.773
Recall: 0.747
Results for SVM, QuantileTransformerNorm:
Accuracy: 0.774
Precision: 0.773
Recall: 0.747
########################################
Results for RF, None:
Accuracy: 0.709
Precision: 0.706
Recall: 0.67
Results for RF, Standard:
Accuracy: 0.71
Precision: 0.708
Recall: 0.67
Results for RF, MinMax:
Accuracy: 0.707
Precision: 0.705
Recall: 0.666
Results for RF, MinMaxRange:
Accuracy: 0.709
Precision: 0.707
Recall: 0.668
Results for RF, Robust:
Accuracy: 0.708
Precision: 0.705
Recall: 0.668
Results for RF, MaxAbs:
Accuracy: 0.707
Precision: 0.705
Recall: 0.666
Results for RF, QuantileTransformer:
Accuracy: 0.711
Precision: 0.711
Recall: 0.666
Results for RF, QuantileTransformerNorm:
Accuracy: 0.711
Precision: 0.711
Recall: 0.666
########################################
Results for XGB, None:
Accuracy: 0.694
Precision: 0.693
Recall: 0.654
Results for XGB, Standard:
Accuracy: 0.694
Precision: 0.693
Recall: 0.654
Results for XGB, MinMax:
Accuracy: 0.694
Precision: 0.693
Recall: 0.654
Results for XGB, MinMaxRange:
Accuracy: 0.694
Precision: 0.693
Recall: 0.654
Results for XGB, Robust:
Accuracy: 0.694
Precision: 0.693
Recall: 0.654
Results for XGB, MaxAbs:
Accuracy: 0.694
Precision: 0.693
Recall: 0.654
Results for XGB, QuantileTransformer:
Accuracy: 0.695
Precision: 0.694
Recall: 0.656
Results for XGB, QuantileTransformerNorm:
Accuracy: 0.695
Precision: 0.694
Recall: 0.656
########################################
Results for CB, None:
Accuracy: 0.713
Precision: 0.711
Recall: 0.672
Results for CB, Standard:
Accuracy: 0.713
Precision: 0.711
Recall: 0.672
Results for CB, MinMax:
Accuracy: 0.713
Precision: 0.711
Recall: 0.672
Results for CB, MinMaxRange:
Accuracy: 0.713
Precision: 0.711
Recall: 0.672
Results for CB, Robust:
Accuracy: 0.713
Precision: 0.711
Recall: 0.672
Results for CB, MaxAbs:
Accuracy: 0.713
Precision: 0.711
Recall: 0.672
Results for CB, QuantileTransformer:
Accuracy: 0.713
Precision: 0.711
Recall: 0.672
Results for CB, QuantileTransformerNorm:
Accuracy: 0.713
Precision: 0.711
Recall: 0.672
########################################
Fall case using 5 clfs and 9 scalers
Results for MLP, None:
Accuracy: 0.888
Precision: 0.828
Recall: 0.507
Results for MLP, Standard:
Accuracy: 0.888
Precision: 0.853
Recall: 0.52
Results for MLP, MinMax:
Accuracy: 0.888
Precision: 0.839
Recall: 0.516
Results for MLP, MinMaxRange:
Accuracy: 0.887
Precision: 0.846
Recall: 0.524
Results for MLP, Robust:
Accuracy: 0.889
Precision: 0.849
Recall: 0.52
Results for MLP, MaxAbs:
Accuracy: 0.889
Precision: 0.838
Recall: 0.512
Results for MLP, QuantileTransformer:
Accuracy: 0.888
Precision: 0.84
Recall: 0.517
Results for MLP, QuantileTransformerNorm:
Accuracy: 0.888
Precision: 0.834
Recall: 0.524
########################################
Results for SVM, None:
Accuracy: 0.856
Precision: 0.857
Recall: 0.313
Results for SVM, Standard:
Accuracy: 0.887
Precision: 0.886
Recall: 0.482
Results for SVM, MinMax:
Accuracy: 0.887
Precision: 0.879
Recall: 0.485
Results for SVM, MinMaxRange:
Accuracy: 0.885
Precision: 0.88
Recall: 0.473
Results for SVM, Robust:
Accuracy: 0.879
Precision: 0.868
Recall: 0.446
Results for SVM, MaxAbs:
Accuracy: 0.887
Precision: 0.879
Recall: 0.485
Results for SVM, QuantileTransformer:
Accuracy: 0.887
Precision: 0.877
Recall: 0.488
Results for SVM, QuantileTransformerNorm:
Accuracy: 0.887
Precision: 0.877
Recall: 0.488
########################################
Results for RF, None:
Accuracy: 0.882
Precision: 0.783
Recall: 0.545
Results for RF, Standard:
Accuracy: 0.883
Precision: 0.786
Recall: 0.545
Results for RF, MinMax:
Accuracy: 0.883
Precision: 0.787
Recall: 0.546
Results for RF, MinMaxRange:
Accuracy: 0.883
Precision: 0.785
Recall: 0.546
Results for RF, Robust:
Accuracy: 0.882
Precision: 0.785
Recall: 0.544
Results for RF, MaxAbs:
Accuracy: 0.883
Precision: 0.788
Recall: 0.545
Results for RF, QuantileTransformer:
Accuracy: 0.882
Precision: 0.784
Recall: 0.546
Results for RF, QuantileTransformerNorm:
Accuracy: 0.882
Precision: 0.784
Recall: 0.546
########################################
Results for XGB, None:
Accuracy: 0.891
Precision: 0.87
Recall: 0.518
Results for XGB, Standard:
Accuracy: 0.891
Precision: 0.87
Recall: 0.518
Results for XGB, MinMax:
Accuracy: 0.891
Precision: 0.87
Recall: 0.518
Results for XGB, MinMaxRange:
Accuracy: 0.891
Precision: 0.87
Recall: 0.518
Results for XGB, Robust:
Accuracy: 0.891
Precision: 0.87
Recall: 0.518
Results for XGB, MaxAbs:
Accuracy: 0.891
Precision: 0.87
Recall: 0.518
Results for XGB, QuantileTransformer:
Accuracy: 0.891
Precision: 0.87
Recall: 0.518
Results for XGB, QuantileTransformerNorm:
Accuracy: 0.891
Precision: 0.87
Recall: 0.518
########################################
Results for CB, None:
Accuracy: 0.892
Precision: 0.871
Recall: 0.521
Results for CB, Standard:
Accuracy: 0.892
Precision: 0.871
Recall: 0.521
Results for CB, MinMax:
Accuracy: 0.892
Precision: 0.871
Recall: 0.521
Results for CB, MinMaxRange:
Accuracy: 0.892
Precision: 0.871
Recall: 0.521
Results for CB, Robust:
Accuracy: 0.892
Precision: 0.871
Recall: 0.521
Results for CB, MaxAbs:
Accuracy: 0.892
Precision: 0.871
Recall: 0.521
Results for CB, QuantileTransformer:
Accuracy: 0.892
Precision: 0.871
Recall: 0.521
Results for CB, QuantileTransformerNorm:
Accuracy: 0.892
Precision: 0.871
Recall: 0.521
########################################
Standard: 73.94% (3.40%)
MinMax (-1, 1): 75.60% (3.56%)
MinMax (0, 1): 77.48% (2.84%)
\ No newline at end of file
#!/usr/bin/env python
import numpy as np
import pandas as pd
import config as cfg
import xgboost as xgb
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.preprocessing import RobustScaler, MaxAbsScaler, QuantileTransformer
from sklearn.metrics import roc_auc_score, precision_score, recall_score
from sklearn.model_selection import train_test_split
from tools import file_reader, mlp_classifier, tree_classifier, kernel_classifier
from pathlib import Path
from sklearn.base import BaseEstimator, TransformerMixin
class DummyScaler(BaseEstimator, TransformerMixin):
def fit_transform(self, X):
return np.array(X)
CASE = "Complete"
OUTPUT_FILENAME = f"{CASE} scaling results.txt"
def load_complete():
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR,
'complete_with_embeddings.csv')
return df
def load_fall():
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR,
'fall_with_embeddings.csv')
return df
def main():
if CASE == "Complete":
df = load_complete()
X = df.drop(['Complete'], axis=1)
y = df['Complete']
n_scale_cols = 17
else:
df = load_fall()
X = df.drop(['Fall'], axis=1)
y = df['Fall']
n_scale_cols = 3
clf_names = ["MLP", "SVM", "RF", "XGB", "CB"]
clfs = [mlp_classifier.train_mlp_cv, kernel_classifier.train_svm_cv,
tree_classifier.train_rf_cv, tree_classifier.train_xgb_cv,
tree_classifier.train_cb_cv]
scaler_names = ["None", "Standard", "MinMax", "MinMaxRange", "Robust",
"MaxAbs", "QuantileTransformer", "QuantileTransformerNorm"]
scalers = [DummyScaler(), StandardScaler(), MinMaxScaler(), MinMaxScaler((-1, 1)), RobustScaler(),
MaxAbsScaler(), QuantileTransformer(), QuantileTransformer(random_state=0),
QuantileTransformer(output_distribution='normal', random_state=0)]
with open(Path.joinpath(cfg.REPORTS_DIR, OUTPUT_FILENAME), "w+") as text_file:
text_file.write(f"{CASE} case using {len(clfs)} clfs and {len(scalers)} scalers\n\n")
for clf_name, clf in zip(clf_names, clfs):
for scaler_name, scaler in zip(scaler_names, scalers):
with open(Path.joinpath(cfg.REPORTS_DIR, OUTPUT_FILENAME), "a") as text_file:
text_file.write(f"Results for {clf_name}, {scaler_name}:\n")
X_sc = pd.DataFrame(scaler.fit_transform(X.iloc[:,:n_scale_cols]))
X_new = pd.concat([X_sc, X.iloc[:,n_scale_cols:]], axis=1)
_, valid_acc, valid_pre, valid_recall = clf(X_new, y)
with open(Path.joinpath(cfg.REPORTS_DIR, OUTPUT_FILENAME), "a") as text_file:
text_file.write(f"Accuracy: {round(np.mean(valid_acc), 3)}\n")
text_file.write(f"Precision: {round(np.mean(valid_pre), 3)}\n")
text_file.write(f"Recall: {round(np.mean(valid_recall), 3)}")
text_file.write("\n\n")
if __name__ == '__main__':
main()
\ No newline at end of file
#!/usr/bin/env python
from tensorflow.python.ops.gen_math_ops import Min
import config as cfg
from tools import file_reader
from tools import file_reader, data_loader
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
import numpy as np
CASE = "Fall"
COMPLETE_FILE = "complete_with_embeddings.csv"
FALL_FILE = "fall_with_embeddings.csv"
NUM_ITERATIONS = 5
CASE = "Complete"
def load_complete():
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR,
'complete_with_embeddings.csv')
return df
def load_fall():
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR,
'fall_with_embeddings.csv')
return df
SCALING_STRATEGY = "Standard"
def create_baseline(input_dim):
model = tf.keras.models.Sequential()
......@@ -35,27 +28,15 @@ def create_baseline(input_dim):
return model
def main():
result_acc, result_std = list(), list()
if CASE == "Complete":
df = load_complete()
X = df.drop(['Complete'], axis=1)
y = df['Complete']
n_scale_cols = 17
X, y = data_loader.CompleteDataLoader(COMPLETE_FILE, cfg.COMPLETE_N_SCALE_COLS) \
.load_data().prepare_data(SCALING_STRATEGY)
else:
df = load_fall()
X = df.drop(['Fall'], axis=1)
y = df['Fall']
n_scale_cols = 3
X, y = data_loader.FallDataLoader(FALL_FILE, cfg.FALL_N_SCALE_COLS) \
.load_data().prepare_data(SCALING_STRATEGY)
result_acc, result_std = list(), list()
for k in range(NUM_ITERATIONS):
X = np.array(X)
y = np.array(y)
scaler = MinMaxScaler()
X_sc = scaler.fit_transform(X[:,:n_scale_cols])
X = np.concatenate([X_sc, X[:,n_scale_cols:]], axis=1)
estimator = tf.keras.wrappers.scikit_learn.KerasClassifier(build_fn=create_baseline,
input_dim=X.shape[1],
epochs=20,
......
......@@ -4,13 +4,8 @@ from analysis.evaluate_dataset_baseline import NUM_ITERATIONS
import config as cfg
from tools import file_reader, file_writer, preprocessor
from sklearn.model_selection import train_test_split, StratifiedKFold
import xgboost as xgb
from sklearn.metrics import accuracy_score
import time
import xgboost as xgb
import pandas as pd
from utility.metrics import eval_gini, gini_xgb
import shap
from sklearn.metrics import accuracy_score
from sklearn.model_selection import KFold
from sklearn.utils import shuffle
......
......@@ -42,6 +42,8 @@ GENERAL_FEATURES = ['Gender', 'Age', 'Cluster']
THRESHOLD_WEEKS = 8
THRESHOLD_TRAINING = 10
COMPLETE_N_SCALE_COLS = 17
FALL_N_SCALE_COLS = 3
PATIENT_ID = 'PatientId'
CITIZEN_ID = 'CitizenId'
......
......@@ -28,7 +28,7 @@ def main():
y = y[:-test_size]
cat_features = [str(i)+'Ex' for i in range(1,10)] + [str(i)+'Ats' for i in range(1,11)]
model, valid_acc, valid_pre, valid_recall = tree_classifier.make_catboost_cv(X, y, cat_features)
model, valid_acc, valid_pre, valid_recall = tree_classifier.train_cb_cv(X, y, cat_features)
print(f"Mean valid accuracy: {round(np.mean(valid_acc), 3)}")
print(f"Mean valid precision: {round(np.mean(valid_pre), 3)}")
......
......@@ -23,7 +23,7 @@ def main():
X = X[:-test_size]
y = y[:-test_size]
model, valid_acc, valid_pre, valid_recall = tree_classifier.make_random_forest_cv(X, y)
model, valid_acc, valid_pre, valid_recall = tree_classifier.train_rf_cv(X, y)
print(f"Mean valid accuracy: {round(np.mean(valid_acc), 3)}")
print(f"Mean valid precision: {round(np.mean(valid_pre), 3)}")
......
......@@ -23,7 +23,7 @@ def main():
X = X[:-test_size]
y = y[:-test_size]
model, valid_acc, valid_pre, valid_recall = tree_classifier.make_xgboost_cv(X, y)
model, valid_acc, valid_pre, valid_recall = tree_classifier.train_xgb_cv(X, y)
print(f"Mean valid accuracy: {round(np.mean(valid_acc), 3)}")
print(f"Mean valid precision: {round(np.mean(valid_pre), 3)}")
......
# !/usr/bin/env python
import config as cfg
import pandas as pd
import numpy as np
......
import config as cfg
import pandas as pd
import numpy as np
import os
from abc import ABC, abstractmethod
from typing import List
from tools import file_reader, preprocessor
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split