Commit 3ce035b8 authored by Christian Marius Lillelund's avatar Christian Marius Lillelund
Browse files

more results

parent 41400215
Pipeline #44147 passed with stage
in 3 minutes and 21 seconds
Complete case 1 - without columns MLP K=0: Accuracy: 0.5401 - Precision: 0.5159 - Recall: 0.5561 MLP K=1: Accuracy: 0.5310 - Precision: 0.4983 - Recall: 0.5107 MLP K=2: Accuracy: 0.5371 - Precision: 0.5222 - Recall: 0.5283 MLP K=3: Accuracy: 0.5350 - Precision: 0.5088 - Recall: 0.5410 MLP K=4: Accuracy: 0.5371 - Precision: 0.5182 - Recall: 0.5257 SVM K=0: Accuracy: 0.5381 - Precision: 0.5121 - Recall: 0.5474 SVM K=1: Accuracy: 0.5086 - Precision: 0.4802 - Recall: 0.5235 SVM K=2: Accuracy: 0.5330 - Precision: 0.5020 - Recall: 0.5195 SVM K=3: Accuracy: 0.5340 - Precision: 0.5065 - Recall: 0.5324 SVM K=4: Accuracy: 0.5360 - Precision: 0.5070 - Recall: 0.5278 XGB K=0: Accuracy: 0.5563 - Precision: 0.5291 - Recall: 0.5432 XGB K=1: Accuracy: 0.5310 - Precision: 0.5023 - Recall: 0.5561 XGB K=2: Accuracy: 0.5218 - Precision: 0.4913 - Recall: 0.5044 XGB K=3: Accuracy: 0.5289 - Precision: 0.4998 - Recall: 0.5409 XGB K=4: Accuracy: 0.5228 - Precision: 0.4957 - Recall: 0.5129 RF K=0: Accuracy: 0.5726 - Precision: 0.5539 - Recall: 0.4699 RF K=1: Accuracy: 0.5553 - Precision: 0.5350 - Recall: 0.4676 RF K=2: Accuracy: 0.5472 - Precision: 0.5212 - Recall: 0.4612 RF K=3: Accuracy: 0.5421 - Precision: 0.5166 - Recall: 0.4353 RF K=4: Accuracy: 0.5462 - Precision: 0.5224 - Recall: 0.4461 SVM K=0: Accuracy: 0.5553 - Precision: 0.5269 - Recall: 0.5584 SVM K=1: Accuracy: 0.5198 - Precision: 0.4922 - Recall: 0.5215 SVM K=2: Accuracy: 0.5533 - Precision: 0.5224 - Recall: 0.5499 SVM K=3: Accuracy: 0.5553 - Precision: 0.5258 - Recall: 0.5539 SVM K=4: Accuracy: 0.5431 - Precision: 0.5151 - Recall: 0.5215
Complete case 1 - without columns MLP K=0: Accuracy: 0.5381 - Precision: 0.5172 - Recall: 0.5389 SVM K=0: Accuracy: 0.5381 - Precision: 0.5121 - Recall: 0.5474 XGB K=0: Accuracy: 0.5563 - Precision: 0.5291 - Recall: 0.5432 RF K=0: Accuracy: 0.5726 - Precision: 0.5539 - Recall: 0.4699 SVM K=0: Accuracy: 0.5553 - Precision: 0.5269 - Recall: 0.5584
Complete case 2 - with one-hot-encoding MLP K=0: Accuracy: 0.5584 - Precision: 0.5311 - Recall: 0.5475 MLP K=1: Accuracy: 0.5553 - Precision: 0.5151 - Recall: 0.5494 MLP K=2: Accuracy: 0.5553 - Precision: 0.5344 - Recall: 0.4873 MLP K=3: Accuracy: 0.5614 - Precision: 0.5409 - Recall: 0.5559 MLP K=4: Accuracy: 0.5401 - Precision: 0.5213 - Recall: 0.5152 SVM K=0: Accuracy: 0.5442 - Precision: 0.5164 - Recall: 0.5582 SVM K=1: Accuracy: 0.5584 - Precision: 0.5310 - Recall: 0.5669 SVM K=2: Accuracy: 0.5340 - Precision: 0.5030 - Recall: 0.5303 SVM K=3: Accuracy: 0.5685 - Precision: 0.5421 - Recall: 0.5734 SVM K=4: Accuracy: 0.5442 - Precision: 0.5169 - Recall: 0.5345 XGB K=0: Accuracy: 0.5330 - Precision: 0.5045 - Recall: 0.5259 XGB K=1: Accuracy: 0.5330 - Precision: 0.5030 - Recall: 0.5128 XGB K=2: Accuracy: 0.5076 - Precision: 0.4759 - Recall: 0.4484 XGB K=3: Accuracy: 0.5310 - Precision: 0.5038 - Recall: 0.5021 XGB K=4: Accuracy: 0.5259 - Precision: 0.4969 - Recall: 0.4741 RF K=0: Accuracy: 0.5604 - Precision: 0.5364 - Recall: 0.4679 RF K=1: Accuracy: 0.5533 - Precision: 0.5304 - Recall: 0.4719 RF K=2: Accuracy: 0.5299 - Precision: 0.5021 - Recall: 0.4420 RF K=3: Accuracy: 0.5533 - Precision: 0.5338 - Recall: 0.4525 RF K=4: Accuracy: 0.5431 - Precision: 0.5190 - Recall: 0.4505 SVM K=0: Accuracy: 0.5746 - Precision: 0.5466 - Recall: 0.5690 SVM K=1: Accuracy: 0.5827 - Precision: 0.5606 - Recall: 0.5646 SVM K=2: Accuracy: 0.5614 - Precision: 0.5366 - Recall: 0.5281 SVM K=3: Accuracy: 0.5817 - Precision: 0.5628 - Recall: 0.5560 SVM K=4: Accuracy: 0.5706 - Precision: 0.5446 - Recall: 0.5367
Complete case 2 - with one-hot-encoding MLP K=0: Accuracy: 0.5472 - Precision: 0.5242 - Recall: 0.5754 SVM K=0: Accuracy: 0.5421 - Precision: 0.5144 - Recall: 0.5582 XGB K=0: Accuracy: 0.5330 - Precision: 0.5045 - Recall: 0.5259 RF K=0: Accuracy: 0.5604 - Precision: 0.5364 - Recall: 0.4679 SVM K=0: Accuracy: 0.5746 - Precision: 0.5466 - Recall: 0.5690
Complete case 3 - with catboost-encoding MLP K=0: Accuracy: 0.5411 - Precision: 0.5101 - Recall: 0.5843 MLP K=1: Accuracy: 0.5249 - Precision: 0.5048 - Recall: 0.5387 MLP K=2: Accuracy: 0.5431 - Precision: 0.4981 - Recall: 0.5605 MLP K=3: Accuracy: 0.5371 - Precision: 0.5082 - Recall: 0.5521 MLP K=4: Accuracy: 0.5421 - Precision: 0.5198 - Recall: 0.5754 SVM K=0: Accuracy: 0.5330 - Precision: 0.5065 - Recall: 0.5344 SVM K=1: Accuracy: 0.5198 - Precision: 0.4912 - Recall: 0.5344 SVM K=2: Accuracy: 0.5198 - Precision: 0.4883 - Recall: 0.4979 SVM K=3: Accuracy: 0.5239 - Precision: 0.4957 - Recall: 0.5280 SVM K=4: Accuracy: 0.5472 - Precision: 0.5168 - Recall: 0.5515 XGB K=0: Accuracy: 0.5228 - Precision: 0.4921 - Recall: 0.4846 XGB K=1: Accuracy: 0.5056 - Precision: 0.4760 - Recall: 0.4998 XGB K=2: Accuracy: 0.5310 - Precision: 0.5021 - Recall: 0.5215 XGB K=3: Accuracy: 0.5320 - Precision: 0.5020 - Recall: 0.5257 XGB K=4: Accuracy: 0.5127 - Precision: 0.4828 - Recall: 0.4914 RF K=0: Accuracy: 0.5574 - Precision: 0.5400 - Recall: 0.4224 RF K=1: Accuracy: 0.5411 - Precision: 0.5166 - Recall: 0.4008 RF K=2: Accuracy: 0.5513 - Precision: 0.5327 - Recall: 0.3922 RF K=3: Accuracy: 0.5523 - Precision: 0.5320 - Recall: 0.4181 RF K=4: Accuracy: 0.5350 - Precision: 0.5069 - Recall: 0.4074 SVM K=0: Accuracy: 0.5635 - Precision: 0.5360 - Recall: 0.5626 SVM K=1: Accuracy: 0.5371 - Precision: 0.5109 - Recall: 0.5237 SVM K=2: Accuracy: 0.5472 - Precision: 0.5154 - Recall: 0.5520 SVM K=3: Accuracy: 0.5574 - Precision: 0.5291 - Recall: 0.5560 SVM K=4: Accuracy: 0.5442 - Precision: 0.5155 - Recall: 0.5194
Complete case 3 - with catboost-encoding MLP K=0: Accuracy: 0.5340 - Precision: 0.5137 - Recall: 0.6210 SVM K=0: Accuracy: 0.5330 - Precision: 0.5065 - Recall: 0.5344 XGB K=0: Accuracy: 0.5228 - Precision: 0.4921 - Recall: 0.4846 RF K=0: Accuracy: 0.5574 - Precision: 0.5400 - Recall: 0.4224 SVM K=0: Accuracy: 0.5635 - Precision: 0.5360 - Recall: 0.5626
Complete case 4 - with embeddings MLP K=0: Accuracy: 0.7594 - Precision: 0.7401 - Recall: 0.7608 MLP K=1: Accuracy: 0.7574 - Precision: 0.7328 - Recall: 0.7608 MLP K=2: Accuracy: 0.7584 - Precision: 0.7299 - Recall: 0.7566 MLP K=3: Accuracy: 0.7594 - Precision: 0.7377 - Recall: 0.7585 MLP K=4: Accuracy: 0.7655 - Precision: 0.7298 - Recall: 0.7716 SVM K=0: Accuracy: 0.7635 - Precision: 0.7456 - Recall: 0.7609 SVM K=1: Accuracy: 0.7614 - Precision: 0.7375 - Recall: 0.7673 SVM K=2: Accuracy: 0.7675 - Precision: 0.7512 - Recall: 0.7588 SVM K=3: Accuracy: 0.7563 - Precision: 0.7447 - Recall: 0.7413 SVM K=4: Accuracy: 0.7695 - Precision: 0.7464 - Recall: 0.7737 XGB K=0: Accuracy: 0.6893 - Precision: 0.6787 - Recall: 0.6487 XGB K=1: Accuracy: 0.6975 - Precision: 0.6746 - Recall: 0.6940 XGB K=2: Accuracy: 0.7025 - Precision: 0.6894 - Recall: 0.6726 XGB K=3: Accuracy: 0.6975 - Precision: 0.6913 - Recall: 0.6595 XGB K=4: Accuracy: 0.6964 - Precision: 0.6731 - Recall: 0.6899 RF K=0: Accuracy: 0.7330 - Precision: 0.7368 - Recall: 0.6768 RF K=1: Accuracy: 0.7249 - Precision: 0.7311 - Recall: 0.6595 RF K=2: Accuracy: 0.7157 - Precision: 0.7209 - Recall: 0.6467 RF K=3: Accuracy: 0.7249 - Precision: 0.7410 - Recall: 0.6464 RF K=4: Accuracy: 0.7289 - Precision: 0.7395 - Recall: 0.6575 SVM K=0: Accuracy: 0.7178 - Precision: 0.7143 - Recall: 0.6703 SVM K=1: Accuracy: 0.6822 - Precision: 0.6719 - Recall: 0.6400 SVM K=2: Accuracy: 0.7005 - Precision: 0.6910 - Recall: 0.6597 SVM K=3: Accuracy: 0.6954 - Precision: 0.6924 - Recall: 0.6443 SVM K=4: Accuracy: 0.6863 - Precision: 0.6778 - Recall: 0.6423
Complete case 4 - with embeddings MLP K=0: Accuracy: 0.7574 - Precision: 0.7269 - Recall: 0.7695 SVM K=0: Accuracy: 0.7635 - Precision: 0.7456 - Recall: 0.7609 XGB K=0: Accuracy: 0.6893 - Precision: 0.6787 - Recall: 0.6487 RF K=0: Accuracy: 0.7330 - Precision: 0.7368 - Recall: 0.6768 SVM K=0: Accuracy: 0.7178 - Precision: 0.7143 - Recall: 0.6703
Complete case 5 - with counts MLP K=0: Accuracy: 0.5340 - Precision: 0.5212 - Recall: 0.5387 MLP K=1: Accuracy: 0.5472 - Precision: 0.5315 - Recall: 0.5044 MLP K=2: Accuracy: 0.5604 - Precision: 0.5283 - Recall: 0.5368 MLP K=3: Accuracy: 0.5614 - Precision: 0.5319 - Recall: 0.5432 MLP K=4: Accuracy: 0.5513 - Precision: 0.5183 - Recall: 0.5409 SVM K=0: Accuracy: 0.5371 - Precision: 0.5106 - Recall: 0.5365 SVM K=1: Accuracy: 0.5452 - Precision: 0.5166 - Recall: 0.5302 SVM K=2: Accuracy: 0.5574 - Precision: 0.5287 - Recall: 0.5476 SVM K=3: Accuracy: 0.5695 - Precision: 0.5449 - Recall: 0.5495 SVM K=4: Accuracy: 0.5574 - Precision: 0.5293 - Recall: 0.5519 XGB K=0: Accuracy: 0.5787 - Precision: 0.5524 - Recall: 0.5540 XGB K=1: Accuracy: 0.5574 - Precision: 0.5301 - Recall: 0.5236 XGB K=2: Accuracy: 0.5584 - Precision: 0.5330 - Recall: 0.5152 XGB K=3: Accuracy: 0.5472 - Precision: 0.5213 - Recall: 0.5151 XGB K=4: Accuracy: 0.5472 - Precision: 0.5195 - Recall: 0.5129 RF K=0: Accuracy: 0.5827 - Precision: 0.5723 - Recall: 0.4677 RF K=1: Accuracy: 0.5919 - Precision: 0.5778 - Recall: 0.5043 RF K=2: Accuracy: 0.6000 - Precision: 0.5892 - Recall: 0.4979 RF K=3: Accuracy: 0.5929 - Precision: 0.5858 - Recall: 0.4827 RF K=4: Accuracy: 0.5726 - Precision: 0.5609 - Recall: 0.4289 SVM K=0: Accuracy: 0.5563 - Precision: 0.5336 - Recall: 0.5129 SVM K=1: Accuracy: 0.5685 - Precision: 0.5451 - Recall: 0.5110 SVM K=2: Accuracy: 0.5756 - Precision: 0.5527 - Recall: 0.5217 SVM K=3: Accuracy: 0.5878 - Precision: 0.5741 - Recall: 0.5216 SVM K=4: Accuracy: 0.5827 - Precision: 0.5613 - Recall: 0.5216
\ No newline at end of file
Complete case 5 - with counts MLP K=0: Accuracy: 0.5391 - Precision: 0.5155 - Recall: 0.5279 SVM K=0: Accuracy: 0.5371 - Precision: 0.5106 - Recall: 0.5365 XGB K=0: Accuracy: 0.5787 - Precision: 0.5524 - Recall: 0.5540 RF K=0: Accuracy: 0.5827 - Precision: 0.5723 - Recall: 0.4677 SVM K=0: Accuracy: 0.5563 - Precision: 0.5336 - Recall: 0.5129
\ No newline at end of file
......
Fall case 1 - without columns MLP K=0: binary_accuracy: 0.7668 - precision: 0.4349 - recall: 0.6639 - auc: 0.7278 MLP K=1: binary_accuracy: 0.7660 - precision: 0.4327 - recall: 0.6517 - auc: 0.7227 MLP K=2: binary_accuracy: 0.7811 - precision: 0.4569 - recall: 0.6639 - auc: 0.7367 MLP K=3: binary_accuracy: 0.7814 - precision: 0.4556 - recall: 0.6354 - auc: 0.7260 MLP K=4: binary_accuracy: 0.7708 - precision: 0.4393 - recall: 0.6449 - auc: 0.7230 XGB K=0: binary_accuracy: 0.7335 - precision: 0.3929 - recall: 0.6789 - auc: 0.7128 XGB K=1: binary_accuracy: 0.7462 - precision: 0.4029 - recall: 0.6327 - auc: 0.7031 XGB K=2: binary_accuracy: 0.7472 - precision: 0.4130 - recall: 0.7102 - auc: 0.7332 XGB K=3: binary_accuracy: 0.7271 - precision: 0.3869 - recall: 0.6884 - auc: 0.7124 XGB K=4: binary_accuracy: 0.7409 - precision: 0.3987 - recall: 0.6531 - auc: 0.7076 RF K=0: binary_accuracy: 0.7702 - precision: 0.4245 - recall: 0.5088 - auc: 0.6711 RF K=1: binary_accuracy: 0.7626 - precision: 0.4088 - recall: 0.4939 - auc: 0.6607 RF K=2: binary_accuracy: 0.7848 - precision: 0.4547 - recall: 0.5320 - auc: 0.6889 RF K=3: binary_accuracy: 0.7652 - precision: 0.4190 - recall: 0.5347 - auc: 0.6778 RF K=4: binary_accuracy: 0.7626 - precision: 0.4138 - recall: 0.5293 - auc: 0.6741 SVM K=0: binary_accuracy: 0.7996 - precision: 0.4873 - recall: 0.5728 - auc: 0.7136 SVM K=1: binary_accuracy: 0.7970 - precision: 0.4816 - recall: 0.5701 - auc: 0.7109 SVM K=2: binary_accuracy: 0.8081 - precision: 0.5058 - recall: 0.5973 - auc: 0.7281 SVM K=3: binary_accuracy: 0.8065 - precision: 0.5023 - recall: 0.5918 - auc: 0.7251 SVM K=4: binary_accuracy: 0.7978 - precision: 0.4840 - recall: 0.5973 - auc: 0.7217
Fall case 1 - without columns MLP K=0: Accuracy: 0.7793 - Precision: 0.4560 - Recall: 0.6420 SVM K=0: Accuracy: 0.7915 - Precision: 0.4711 - Recall: 0.5792 XGB K=0: Accuracy: 0.7264 - Precision: 0.3922 - Recall: 0.7392 RF K=0: Accuracy: 0.8154 - Precision: 0.5328 - Recall: 0.4147 SVM K=0: Accuracy: 0.7786 - Precision: 0.4503 - Recall: 0.6237
Fall case 2 - with one-hot-encoding MLP K=0: binary_accuracy: 0.8306 - precision: 0.5527 - recall: 0.6776 - auc: 0.7726 MLP K=1: binary_accuracy: 0.8137 - precision: 0.5168 - recall: 0.6476 - auc: 0.7507 MLP K=2: binary_accuracy: 0.8274 - precision: 0.5446 - recall: 0.6898 - auc: 0.7752 MLP K=3: binary_accuracy: 0.8242 - precision: 0.5401 - recall: 0.6503 - auc: 0.7583 MLP K=4: binary_accuracy: 0.8240 - precision: 0.5380 - recall: 0.6748 - auc: 0.7674 XGB K=0: binary_accuracy: 0.8348 - precision: 0.5622 - recall: 0.6830 - auc: 0.7773 XGB K=1: binary_accuracy: 0.8282 - precision: 0.5491 - recall: 0.6544 - auc: 0.7623 XGB K=2: binary_accuracy: 0.8425 - precision: 0.5780 - recall: 0.7061 - auc: 0.7908 XGB K=3: binary_accuracy: 0.8277 - precision: 0.5474 - recall: 0.6599 - auc: 0.7640 XGB K=4: binary_accuracy: 0.8377 - precision: 0.5706 - recall: 0.6707 - auc: 0.7744 RF K=0: binary_accuracy: 0.8711 - precision: 0.7160 - recall: 0.5592 - auc: 0.7528 RF K=1: binary_accuracy: 0.8579 - precision: 0.6689 - recall: 0.5333 - auc: 0.7348 RF K=2: binary_accuracy: 0.8772 - precision: 0.7365 - recall: 0.5741 - auc: 0.7623 RF K=3: binary_accuracy: 0.8642 - precision: 0.6907 - recall: 0.5469 - auc: 0.7439 RF K=4: binary_accuracy: 0.8650 - precision: 0.7027 - recall: 0.5306 - auc: 0.7382 SVM K=0: binary_accuracy: 0.8240 - precision: 0.5398 - recall: 0.6463 - auc: 0.7566 SVM K=1: binary_accuracy: 0.8240 - precision: 0.5437 - recall: 0.5918 - auc: 0.7359 SVM K=2: binary_accuracy: 0.8282 - precision: 0.5490 - recall: 0.6558 - auc: 0.7628 SVM K=3: binary_accuracy: 0.8311 - precision: 0.5627 - recall: 0.5918 - auc: 0.7404 SVM K=4: binary_accuracy: 0.8232 - precision: 0.5401 - recall: 0.6136 - auc: 0.7437
Fall case 3 - with catboost-encoding MLP K=0: binary_accuracy: 0.7880 - precision: 0.4680 - recall: 0.6571 - auc: 0.7384 MLP K=1: binary_accuracy: 0.7954 - precision: 0.4802 - recall: 0.6259 - auc: 0.7311 MLP K=2: binary_accuracy: 0.7875 - precision: 0.4699 - recall: 0.7211 - auc: 0.7623 MLP K=3: binary_accuracy: 0.7785 - precision: 0.4549 - recall: 0.7007 - auc: 0.7490 MLP K=4: binary_accuracy: 0.7914 - precision: 0.4740 - recall: 0.6585 - auc: 0.7410 XGB K=0: binary_accuracy: 0.8507 - precision: 0.6227 - recall: 0.5905 - auc: 0.7520 XGB K=1: binary_accuracy: 0.8377 - precision: 0.5884 - recall: 0.5524 - auc: 0.7295 XGB K=2: binary_accuracy: 0.8536 - precision: 0.6289 - recall: 0.6041 - auc: 0.7590 XGB K=3: binary_accuracy: 0.8433 - precision: 0.5989 - recall: 0.5891 - auc: 0.7469 XGB K=4: binary_accuracy: 0.8438 - precision: 0.6087 - recall: 0.5524 - auc: 0.7333 RF K=0: binary_accuracy: 0.8849 - precision: 0.8807 - recall: 0.4721 - auc: 0.7283 RF K=1: binary_accuracy: 0.8819 - precision: 0.8658 - recall: 0.4653 - auc: 0.7239 RF K=2: binary_accuracy: 0.8886 - precision: 0.8905 - recall: 0.4871 - auc: 0.7363 RF K=3: binary_accuracy: 0.8814 - precision: 0.8689 - recall: 0.4599 - auc: 0.7216 RF K=4: binary_accuracy: 0.8801 - precision: 0.8832 - recall: 0.4422 - auc: 0.7140 SVM K=0: binary_accuracy: 0.8073 - precision: 0.5038 - recall: 0.6272 - auc: 0.7390 SVM K=1: binary_accuracy: 0.8017 - precision: 0.4925 - recall: 0.6286 - auc: 0.7361 SVM K=2: binary_accuracy: 0.8176 - precision: 0.5246 - recall: 0.6680 - auc: 0.7609 SVM K=3: binary_accuracy: 0.8068 - precision: 0.5026 - recall: 0.6639 - auc: 0.7526 SVM K=4: binary_accuracy: 0.7872 - precision: 0.4666 - recall: 0.6558 - auc: 0.7374
Fall case 4 - with embeddings MLP K=0: binary_accuracy: 0.8195 - precision: 0.5255 - recall: 0.7442 - auc: 0.7909 MLP K=1: binary_accuracy: 0.8393 - precision: 0.5731 - recall: 0.6830 - auc: 0.7800 MLP K=2: binary_accuracy: 0.8377 - precision: 0.5616 - recall: 0.7565 - auc: 0.8069 MLP K=3: binary_accuracy: 0.8422 - precision: 0.5768 - recall: 0.7102 - auc: 0.7922 MLP K=4: binary_accuracy: 0.8343 - precision: 0.5566 - recall: 0.7293 - auc: 0.7945 XGB K=0: binary_accuracy: 0.8462 - precision: 0.5895 - recall: 0.6898 - auc: 0.7869 XGB K=1: binary_accuracy: 0.8385 - precision: 0.5752 - recall: 0.6503 - auc: 0.7672 XGB K=2: binary_accuracy: 0.8502 - precision: 0.5951 - recall: 0.7197 - auc: 0.8007 XGB K=3: binary_accuracy: 0.8359 - precision: 0.5671 - recall: 0.6612 - auc: 0.7697 XGB K=4: binary_accuracy: 0.8396 - precision: 0.5762 - recall: 0.6639 - auc: 0.7730 RF K=0: binary_accuracy: 0.8769 - precision: 0.7336 - recall: 0.5769 - auc: 0.7631 RF K=1: binary_accuracy: 0.8632 - precision: 0.6960 - recall: 0.5265 - auc: 0.7355 RF K=2: binary_accuracy: 0.8814 - precision: 0.7576 - recall: 0.5741 - auc: 0.7649 RF K=3: binary_accuracy: 0.8653 - precision: 0.6928 - recall: 0.5524 - auc: 0.7466 RF K=4: binary_accuracy: 0.8719 - precision: 0.7229 - recall: 0.5537 - auc: 0.7512 SVM K=0: binary_accuracy: 0.8346 - precision: 0.5581 - recall: 0.7184 - auc: 0.7905 SVM K=1: binary_accuracy: 0.8399 - precision: 0.5749 - recall: 0.6789 - auc: 0.7788 SVM K=2: binary_accuracy: 0.8422 - precision: 0.5717 - recall: 0.7537 - auc: 0.8087 SVM K=3: binary_accuracy: 0.8449 - precision: 0.5831 - recall: 0.7116 - auc: 0.7943 SVM K=4: binary_accuracy: 0.8319 - precision: 0.5532 - recall: 0.7075 - auc: 0.7847
Fall case 5 - with counts MLP K=0: binary_accuracy: 0.8219 - precision: 0.5314 - recall: 0.7129 - auc: 0.7806 MLP K=1: binary_accuracy: 0.8139 - precision: 0.5161 - recall: 0.6966 - auc: 0.7694 MLP K=2: binary_accuracy: 0.8383 - precision: 0.5647 - recall: 0.7361 - auc: 0.7995 MLP K=3: binary_accuracy: 0.8256 - precision: 0.5398 - recall: 0.7007 - auc: 0.7782 MLP K=4: binary_accuracy: 0.8272 - precision: 0.5432 - recall: 0.7007 - auc: 0.7792 XGB K=0: binary_accuracy: 0.8452 - precision: 0.5856 - recall: 0.6980 - auc: 0.7893 XGB K=1: binary_accuracy: 0.8393 - precision: 0.5727 - recall: 0.6857 - auc: 0.7811 XGB K=2: binary_accuracy: 0.8499 - precision: 0.5915 - recall: 0.7388 - auc: 0.8078 XGB K=3: binary_accuracy: 0.8473 - precision: 0.5888 - recall: 0.7129 - auc: 0.7963 XGB K=4: binary_accuracy: 0.8483 - precision: 0.5946 - recall: 0.6925 - auc: 0.7892 RF K=0: binary_accuracy: 0.8790 - precision: 0.7422 - recall: 0.5796 - auc: 0.7655 RF K=1: binary_accuracy: 0.8687 - precision: 0.7029 - recall: 0.5633 - auc: 0.7529 RF K=2: binary_accuracy: 0.8867 - precision: 0.7698 - recall: 0.5959 - auc: 0.7764 RF K=3: binary_accuracy: 0.8724 - precision: 0.7148 - recall: 0.5728 - auc: 0.7588 RF K=4: binary_accuracy: 0.8745 - precision: 0.7318 - recall: 0.5605 - auc: 0.7555 SVM K=0: binary_accuracy: 0.8441 - precision: 0.5877 - recall: 0.6653 - auc: 0.7763 SVM K=1: binary_accuracy: 0.8362 - precision: 0.5716 - recall: 0.6299 - auc: 0.7579 SVM K=2: binary_accuracy: 0.8489 - precision: 0.5958 - recall: 0.6939 - auc: 0.7901 SVM K=3: binary_accuracy: 0.8311 - precision: 0.5559 - recall: 0.6558 - auc: 0.7646 SVM K=4: binary_accuracy: 0.8391 - precision: 0.5773 - recall: 0.6449 - auc: 0.7654
\ No newline at end of file
Fall case 2 - with one-hot-encoding MLP K=0: Accuracy: 0.8183 - Precision: 0.5226 - Recall: 0.6739 SVM K=0: Accuracy: 0.8012 - Precision: 0.4920 - Recall: 0.6759 XGB K=0: Accuracy: 0.8278 - Precision: 0.5466 - Recall: 0.6857 RF K=0: Accuracy: 0.8765 - Precision: 0.7707 - Recall: 0.5200
\ No newline at end of file
......
#!/usr/bin/env python
import numpy as np
import config as cfg
from typing import List
from tools import file_reader, file_writer, preprocessor
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
......@@ -10,10 +11,9 @@ from tools import mlp_classifier, tree_classifier
from tools import kernel_classifier, log_classifier
USE_BALANCING = True
NUM_ITERATIONS = 5
CASE = "Complete"
NUM_ITERATIONS = 1
CASE = "Fall"
RESULTS_FILENAME = f"{CASE} baseline results.txt"
ROC_FILENAME = f"{CASE} baseline ROC"
class Result:
def __init__(self, name, result):
......@@ -40,14 +40,14 @@ def load_fall():
return df
def main():
# Case 1
clf_names = ["MLP", "LR", "XGB", "RF", "SVM"]
# Case 1
with open(Path.joinpath(cfg.REPORTS_DIR, RESULTS_FILENAME), "w+") as text_file:
text_file.write(f"{CASE} case 1 - without columns")
if CASE == "Complete":
df = load_complete()
X = df.drop(['Complete'], axis=1)
X = df.drop(['Complete'], axis=1)
y = df['Complete']
cat_cols = EX_COLS + ATS_COLS
X = X.drop(cat_cols, axis=1)
......@@ -65,7 +65,7 @@ def main():
train_xgboost(X, y, n_scale_cols=X.shape[1]),
train_random_forest(X, y, n_scale_cols=X.shape[1]),
train_svm(X, y, n_scale_cols=X.shape[1])]
make_and_print_roc_curve(y, zip(clf_names, y_pred_probas), "without columns")
make_plots(y, zip(clf_names, y_pred_probas), CASE, 1, "without columns")
# Case 2
with open(Path.joinpath(cfg.REPORTS_DIR, RESULTS_FILENAME), "a") as text_file:
......@@ -87,11 +87,11 @@ def main():
X = np.array(X)
y = np.array(y)
y_pred_probas = [train_mlp(X, y, n_scale_cols=X.shape[1]),
train_lr(X, y, n_scale_cols=X.shape[1]),
train_lr(X, y, n_scale_cols=X.shape[1], lr_solver="liblinear"),
train_xgboost(X, y, n_scale_cols=X.shape[1]),
train_random_forest(X, y, n_scale_cols=X.shape[1]),
train_svm(X, y, n_scale_cols=X.shape[1])]
make_and_print_roc_curve(y, zip(clf_names, y_pred_probas), "with one-hot-encoding")
make_plots(y, zip(clf_names, y_pred_probas), CASE, 2, "with one-hot-encoding")
# Case 3
with open(Path.joinpath(cfg.REPORTS_DIR, RESULTS_FILENAME), "a") as text_file:
......@@ -123,7 +123,7 @@ def main():
train_xgboost(X, y, n_scale_cols=n_scale_cols),
train_random_forest(X, y, n_scale_cols=n_scale_cols),
train_svm(X, y, n_scale_cols=n_scale_cols)]
make_and_print_roc_curve(y, zip(clf_names, y_pred_probas), "with catboost-encoding")
make_plots(y, zip(clf_names, y_pred_probas), CASE, 3, "with catboost-encoding")
# Case 4
with open(Path.joinpath(cfg.REPORTS_DIR, RESULTS_FILENAME), "a") as text_file:
......@@ -152,7 +152,7 @@ def main():
train_xgboost(X, y, n_scale_cols=n_scale_cols),
train_random_forest(X, y, n_scale_cols=n_scale_cols),
train_svm(X, y, n_scale_cols=n_scale_cols)]
make_and_print_roc_curve(y, zip(clf_names, y_pred_probas), "with embeddings")
make_plots(y, zip(clf_names, y_pred_probas), CASE, 4, "with embeddings")
# Case 5
with open(Path.joinpath(cfg.REPORTS_DIR, RESULTS_FILENAME), "a") as text_file:
......@@ -171,20 +171,16 @@ def main():
X = np.array(X)
y = np.array(y)
y_pred_probas = [train_mlp(X, y, n_scale_cols=X.shape[1]),
train_lr(X, y, n_scale_cols=X.shape[1]),
train_lr(X, y, n_scale_cols=X.shape[1], lr_solver="liblinear"),
train_xgboost(X, y, n_scale_cols=X.shape[1]),
train_random_forest(X, y, n_scale_cols=X.shape[1]),
train_svm(X, y, n_scale_cols=X.shape[1])]
make_and_print_roc_curve(y, zip(clf_names, y_pred_probas), "with counts")
make_plots(y, zip(clf_names, y_pred_probas), CASE, 5, "with counts")
def train_mlp(X: np.ndarray, y: np.ndarray, n_scale_cols:int=0) -> np.ndarray:
y_pred_probas = 0
for k in range(NUM_ITERATIONS):
scaler = StandardScaler()
X_sc = scaler.fit_transform(X[:,:n_scale_cols])
X = np.concatenate([X_sc, X[:,n_scale_cols:]], axis=1)
X = preprocessor.scale_data_standard(X, n_scale_cols)
_, res_acc, res_pre, res_rec, res_probas = mlp_classifier.train_mlp_cv(X, y, k)
y_pred_probas += res_probas[:,1]
make_and_print_scores("MLP", k, res_acc, res_pre, res_rec)
......@@ -194,11 +190,7 @@ def train_mlp(X: np.ndarray, y: np.ndarray, n_scale_cols:int=0) -> np.ndarray:
def train_xgboost(X: np.ndarray, y: np.ndarray, n_scale_cols:int=0) -> np.ndarray:
y_pred_probas = 0
for k in range(NUM_ITERATIONS):
scaler = StandardScaler()
X_sc = scaler.fit_transform(X[:,:n_scale_cols])
X = np.concatenate([X_sc, X[:,n_scale_cols:]], axis=1)
X = preprocessor.scale_data_standard(X, n_scale_cols)
_, res_acc, res_pre, res_rec, res_probas = tree_classifier.train_xgb_cv(X, y, k)
y_pred_probas += res_probas[:,1]
make_and_print_scores("XGB", k, res_acc, res_pre, res_rec)
......@@ -208,11 +200,7 @@ def train_xgboost(X: np.ndarray, y: np.ndarray, n_scale_cols:int=0) -> np.ndarra
def train_random_forest(X: np.ndarray, y: np.ndarray, n_scale_cols:int=0) -> np.ndarray:
y_pred_probas = 0
for k in range(NUM_ITERATIONS):
scaler = StandardScaler()
X_sc = scaler.fit_transform(X[:,:n_scale_cols])
X = np.concatenate([X_sc, X[:,n_scale_cols:]], axis=1)
X = preprocessor.scale_data_standard(X, n_scale_cols)
_, res_acc, res_pre, res_rec, res_probas = tree_classifier.train_rf_cv(X, y, k)
y_pred_probas += res_probas[:,1]
make_and_print_scores("RF", k, res_acc, res_pre, res_rec)
......@@ -222,32 +210,26 @@ def train_random_forest(X: np.ndarray, y: np.ndarray, n_scale_cols:int=0) -> np.
def train_svm(X: np.ndarray, y: np.ndarray, n_scale_cols:int=0) -> np.ndarray:
y_pred_probas = 0
for k in range(NUM_ITERATIONS):
scaler = StandardScaler()
X_sc = scaler.fit_transform(X[:,:n_scale_cols])
X = np.concatenate([X_sc, X[:,n_scale_cols:]], axis=1)
X = preprocessor.scale_data_standard(X, n_scale_cols)
_, res_acc, res_pre, res_rec, res_probas = kernel_classifier.train_svm_cv(X, y, k)
y_pred_probas += res_probas[:,1]
make_and_print_scores("SVM", k, res_acc, res_pre, res_rec)
y_pred_probas /= NUM_ITERATIONS
return y_pred_probas
def train_lr(X: np.ndarray, y: np.ndarray, n_scale_cols:int=0) -> np.ndarray:
def train_lr(X: np.ndarray, y: np.ndarray, n_scale_cols:int=0,
lr_solver:str="lbfgs") -> np.ndarray:
y_pred_probas = 0
for k in range(NUM_ITERATIONS):
scaler = StandardScaler()
X_sc = scaler.fit_transform(X[:,:n_scale_cols])
X = np.concatenate([X_sc, X[:,n_scale_cols:]], axis=1)
_, res_acc, res_pre, res_rec, res_probas = log_classifier.train_lr_cv(X, y, k)
X = preprocessor.scale_data_standard(X, n_scale_cols)
_, res_acc, res_pre, res_rec, res_probas = log_classifier.train_lr_cv(X, y, k, lr_solver)
y_pred_probas += res_probas[:,1]
make_and_print_scores("SVM", k, res_acc, res_pre, res_rec)
y_pred_probas /= NUM_ITERATIONS
return y_pred_probas
def make_and_print_scores(classifer_name, k, res_acc, res_pre, res_rec):
def make_and_print_scores(classifer_name: str, k: int, res_acc: List,
res_pre: List, res_rec: List):
results = [Result("Accuracy", res_acc.mean()),
Result("Precision", res_pre.mean()),
Result("Recall", res_rec.mean())]
......@@ -256,10 +238,16 @@ def make_and_print_scores(classifer_name, k, res_acc, res_pre, res_rec):
text ="\r{} K={}: {}".format(classifer_name, k, metrics)
with open(Path.joinpath(cfg.REPORTS_DIR, RESULTS_FILENAME), "a") as text_file:
text_file.write(text)
def make_and_print_roc_curve(y_test, probas_results, case_name):
file_name = str(ROC_FILENAME) + " " + str(case_name) + ".pdf"
file_writer.write_roc_curve(y_test, probas_results, cfg.REPORTS_DIR, file_name)
def make_plots(y_test: np.ndarray, probas_results: np.ndarray,
case_name: str, case_number: int, case_subtitle):
roc_file_name = f"{case_name} case {case_number} - ROC curves.pdf"
score_file_name = f"{case_name} case {case_number} - Scores.pdf"
probas_results_list = list(probas_results)
file_writer.write_roc_curve(y_test, probas_results_list,
cfg.REPORTS_PLOTS_DIR, roc_file_name, case_subtitle)
file_writer.write_score_plot(y_test, probas_results_list,
cfg.REPORTS_PLOTS_DIR, score_file_name, case_subtitle)
if __name__ == '__main__':
main()
......@@ -17,6 +17,7 @@ PATHS_2020 = ['borgere_hmi_Rasmus_BorgerId_Gender_BirthYear.xlsx',
ROOT_DIR = Path(__file__).absolute().parent.parent
MODELS_DIR = Path.joinpath(ROOT_DIR, 'models')
REPORTS_DIR = Path.joinpath(ROOT_DIR, 'reports')
REPORTS_PLOTS_DIR = Path.joinpath(ROOT_DIR, 'reports/plots')
LOGS_DIR = Path.joinpath(ROOT_DIR, 'src/logs')
CONFIG_DIR = Path.joinpath(ROOT_DIR, 'src/cfg')
TESTS_FILES_DIR = Path.joinpath(ROOT_DIR, 'tests/files')
......
......@@ -28,7 +28,7 @@ def main():
y_test = np.array(y_test)
n_num_cols = len(list(df.select_dtypes(exclude = ['object']))) - 1
X_train, X_test, y_train, y_test, _ = preprocessor.scale_data_standard(X_train, X_test,
X_train, X_test, y_train, y_test, _ = preprocessor.scale_split_data_standard(X_train, X_test,
y_train, y_test, n_num_cols)
model = tf.keras.Sequential([
......
......@@ -30,7 +30,7 @@ def main():
X_train, X_valid, y_train, y_valid, labels = preprocessor.prepare_data_for_embedder(
df, target_name, train_ratio, n_num_cols)
X_train, X_valid, y_train, y_valid, scaler = preprocessor.scale_data_standard(
X_train, X_valid, y_train, y_valid, scaler = preprocessor.scale_split_data_standard(
X_train, X_valid, y_train, y_valid, n_num_cols)
metrics = [
......
import os
from pathlib import Path
import pandas as pd
from typing import List
......@@ -7,7 +8,8 @@ import joblib
import numpy as np
import tensorflow as tf
import csv
from sklearn.metrics import roc_curve
from sklearn.metrics import precision_score, recall_score
from sklearn.metrics import roc_curve, accuracy_score, roc_auc_score
def write_csv(df: pd.DataFrame,
path: Path,
......@@ -74,18 +76,45 @@ def write_shap_importance_plot(features, importances, path, file_name):
dpi=300,
bbox_inches = "tight")
def write_roc_curve(y_test, probas_results, path, file_name):
def write_roc_curve(y_test: np.ndarray, probas_results: List,
path: Path, file_name: str, case_subtitle: str):
plt.close()
plt.title(file_name)
plt.suptitle(os.path.splitext(file_name)[0])
plt.title(case_subtitle)
plt.ylabel("TPR")
plt.xlabel("FPR")
for clf_name, y_probas in list(probas_results):
for clf_name, y_probas in probas_results:
fpr, tpr, _ = roc_curve(y_test, y_probas)
plt.plot(fpr, tpr, linewidth=2, label=clf_name)
plt.plot([0, 1], [0, 1], 'k--')
plt.legend(loc="lower right")
plt.savefig(Path.joinpath(path, file_name), dpi=300, bbox_inches = "tight")
def write_score_plot(y_test: np.ndarray, probas_results: List,
path: Path, file_name: str, case_subtitle: str):
plt.close()
plt.suptitle(os.path.splitext(file_name)[0])
plt.title(case_subtitle)
plt.ylabel("Score")
plt.xlabel("Classifiers")
labels = [result[0] for result in probas_results]
acc_scores, pre_scores, rec_scores, roc_auc_scores = list(), list(), list(), list()
for _, y_probas in probas_results:
y_scores_new = (y_probas > 0.5)
acc_scores.append(accuracy_score(y_test, y_scores_new))
pre_scores.append(precision_score(y_test, y_scores_new))
rec_scores.append(recall_score(y_test, y_scores_new))
roc_auc_scores.append(roc_auc_score(y_test, y_probas))
plot_range = range(len(probas_results))
plt.plot(plot_range, acc_scores, "bo-", label='Classifier accuracy')
plt.plot(plot_range, pre_scores, "go-", label='Classifier precision')
plt.plot(plot_range, rec_scores, "ro-", label='Classifier recall')
plt.plot(plot_range, roc_auc_scores, "ko-", label='Classifier AUC')
plt.xticks(plot_range, labels, rotation='vertical')
plt.margins(0.2)
plt.legend(loc="lower right")
plt.savefig(Path.joinpath(path, file_name), dpi=300, bbox_inches = "tight")
def write_shap_explanation(expected_value: float,
shap_value: np.ndarray,
row: pd.Series,
......
......@@ -3,8 +3,9 @@ from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score, cross_val_predict
from sklearn.model_selection import StratifiedKFold
def train_lr_cv(X: pd.DataFrame, y: pd.Series, k: int=0):
def train_lr_cv(X: pd.DataFrame, y: pd.Series, k: int=0, solver: str="lbfgs"):
model = LogisticRegression(class_weight="balanced",
solver=solver,
max_iter=400,
random_state=0)
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=k)
......
......@@ -92,7 +92,13 @@ def prepare_data_for_embedder(df: pd.DataFrame,
return X_train, X_valid, y_train, y_valid, labels
def scale_data_standard(X_train: np.ndarray, X_valid: np.ndarray,
def scale_data_standard(X: np.ndarray, n_scale_cols: int):
scaler = StandardScaler()
X_sc = scaler.fit_transform(X[:,:n_scale_cols])
X = np.concatenate([X_sc, X[:,n_scale_cols:]], axis=1)
return X
def scale_split_data_standard(X_train: np.ndarray, X_valid: np.ndarray,
y_train: np.ndarray, y_valid: np.ndarray,
n_scale_cols: int):
scaler = StandardScaler()
......@@ -102,7 +108,7 @@ def scale_data_standard(X_train: np.ndarray, X_valid: np.ndarray,
X_valid = np.concatenate([X_valid_sc, X_valid[:,n_scale_cols:]], axis=1)
return X_train, X_valid, y_train, y_valid, scaler
def scale_data_min_max(X_train: np.ndarray, X_valid: np.ndarray,
def scale_split_data_min_max(X_train: np.ndarray, X_valid: np.ndarray,
y_train: np.ndarray, y_valid: np.ndarray, n_scale_cols: int):
scaler = MinMaxScaler()
X_train_sc = scaler.fit_transform(X_train[:,:n_scale_cols])
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment