Commit 14a0f367 authored by Christian Marius Lillelund's avatar Christian Marius Lillelund
Browse files

updated baseline script for both cases

parent 1223291f
Pipeline #43398 passed with stage
in 2 minutes and 46 seconds
Complete case 1 - without columns MLP K=0: binary_accuracy: 0.5751 - precision: 0.5571 - recall: 0.5235 - auc: 0.5727 MLP K=1: binary_accuracy: 0.5751 - precision: 0.5625 - recall: 0.4832 - auc: 0.5709 MLP K=2: binary_accuracy: 0.5655 - precision: 0.5371 - recall: 0.6309 - auc: 0.5685 MLP K=3: binary_accuracy: 0.5879 - precision: 0.5617 - recall: 0.6107 - auc: 0.5889 MLP K=4: binary_accuracy: 0.5655 - precision: 0.5489 - recall: 0.4899 - auc: 0.5620 XGB K=0: binary_accuracy: 0.6134 - precision: 0.5909 - recall: 0.6107 - auc: 0.6133 XGB K=1: binary_accuracy: 0.5399 - precision: 0.5175 - recall: 0.4966 - auc: 0.5380 XGB K=2: binary_accuracy: 0.5335 - precision: 0.5099 - recall: 0.5168 - auc: 0.5328 XGB K=3: binary_accuracy: 0.5272 - precision: 0.5032 - recall: 0.5302 - auc: 0.5273 XGB K=4: binary_accuracy: 0.5751 - precision: 0.5513 - recall: 0.5772 - auc: 0.5752 RF K=0: binary_accuracy: 0.5879 - precision: 0.5714 - recall: 0.5369 - auc: 0.5855 RF K=1: binary_accuracy: 0.5847 - precision: 0.5772 - recall: 0.4765 - auc: 0.5797 RF K=2: binary_accuracy: 0.5463 - precision: 0.5255 - recall: 0.4832 - auc: 0.5434 RF K=3: binary_accuracy: 0.5335 - precision: 0.5108 - recall: 0.4765 - auc: 0.5309 RF K=4: binary_accuracy: 0.5367 - precision: 0.5133 - recall: 0.5168 - auc: 0.5358 SVM K=0: binary_accuracy: 0.5623 - precision: 0.5385 - recall: 0.5638 - auc: 0.5624 SVM K=1: binary_accuracy: 0.5591 - precision: 0.5440 - recall: 0.4564 - auc: 0.5544 SVM K=2: binary_accuracy: 0.5974 - precision: 0.5657 - recall: 0.6644 - auc: 0.6005 SVM K=3: binary_accuracy: 0.5495 - precision: 0.5270 - recall: 0.5235 - auc: 0.5483 SVM K=4: binary_accuracy: 0.5495 - precision: 0.5233 - recall: 0.6040 - auc: 0.5520
Complete case 2 - with one-hot-encoding MLP K=0: binary_accuracy: 0.5559 - precision: 0.5333 - recall: 0.5369 - auc: 0.5550 MLP K=1: binary_accuracy: 0.5304 - precision: 0.5067 - recall: 0.5101 - auc: 0.5294 MLP K=2: binary_accuracy: 0.5847 - precision: 0.5674 - recall: 0.5369 - auc: 0.5825 MLP K=3: binary_accuracy: 0.5208 - precision: 0.4969 - recall: 0.5302 - auc: 0.5212 MLP K=4: binary_accuracy: 0.5176 - precision: 0.4938 - recall: 0.5302 - auc: 0.5181 XGB K=0: binary_accuracy: 0.5495 - precision: 0.5270 - recall: 0.5235 - auc: 0.5483 XGB K=1: binary_accuracy: 0.5495 - precision: 0.5263 - recall: 0.5369 - auc: 0.5489 XGB K=2: binary_accuracy: 0.5463 - precision: 0.5232 - recall: 0.5302 - auc: 0.5456 XGB K=3: binary_accuracy: 0.5240 - precision: 0.5000 - recall: 0.5302 - auc: 0.5242 XGB K=4: binary_accuracy: 0.5335 - precision: 0.5099 - recall: 0.5168 - auc: 0.5328 RF K=0: binary_accuracy: 0.5719 - precision: 0.5547 - recall: 0.5101 - auc: 0.5691 RF K=1: binary_accuracy: 0.5719 - precision: 0.5564 - recall: 0.4966 - auc: 0.5684 RF K=2: binary_accuracy: 0.5783 - precision: 0.5578 - recall: 0.5503 - auc: 0.5770 RF K=3: binary_accuracy: 0.5208 - precision: 0.4965 - recall: 0.4698 - auc: 0.5184 RF K=4: binary_accuracy: 0.5527 - precision: 0.5344 - recall: 0.4698 - auc: 0.5489 SVM K=0: binary_accuracy: 0.5783 - precision: 0.5669 - recall: 0.4832 - auc: 0.5739 SVM K=1: binary_accuracy: 0.6006 - precision: 0.5882 - recall: 0.5369 - auc: 0.5977 SVM K=2: binary_accuracy: 0.5974 - precision: 0.5752 - recall: 0.5906 - auc: 0.5971 SVM K=3: binary_accuracy: 0.5815 - precision: 0.5549 - recall: 0.6107 - auc: 0.5828 SVM K=4: binary_accuracy: 0.5240 - precision: 0.5000 - recall: 0.4161 - auc: 0.5190
Complete case 3 - with catboost-encoding MLP K=0: binary_accuracy: 0.5335 - precision: 0.5097 - recall: 0.5302 - auc: 0.5334 MLP K=1: binary_accuracy: 0.4952 - precision: 0.4662 - recall: 0.4161 - auc: 0.4916 MLP K=2: binary_accuracy: 0.5367 - precision: 0.5130 - recall: 0.5302 - auc: 0.5364 MLP K=3: binary_accuracy: 0.5272 - precision: 0.5034 - recall: 0.5034 - auc: 0.5261 MLP K=4: binary_accuracy: 0.5463 - precision: 0.5259 - recall: 0.4765 - auc: 0.5431 XGB K=0: binary_accuracy: 0.5176 - precision: 0.4932 - recall: 0.4899 - auc: 0.5163 XGB K=1: binary_accuracy: 0.5176 - precision: 0.4929 - recall: 0.4631 - auc: 0.5151 XGB K=2: binary_accuracy: 0.4601 - precision: 0.4324 - recall: 0.4295 - auc: 0.4587 XGB K=3: binary_accuracy: 0.5016 - precision: 0.4755 - recall: 0.4564 - auc: 0.4995 XGB K=4: binary_accuracy: 0.5463 - precision: 0.5226 - recall: 0.5436 - auc: 0.5462 RF K=0: binary_accuracy: 0.5783 - precision: 0.5752 - recall: 0.4362 - auc: 0.5718 RF K=1: binary_accuracy: 0.5176 - precision: 0.4906 - recall: 0.3490 - auc: 0.5099 RF K=2: binary_accuracy: 0.5016 - precision: 0.4748 - recall: 0.4430 - auc: 0.4989 RF K=3: binary_accuracy: 0.5144 - precision: 0.4872 - recall: 0.3826 - auc: 0.5083 RF K=4: binary_accuracy: 0.5399 - precision: 0.5210 - recall: 0.4161 - auc: 0.5343 SVM K=0: binary_accuracy: 0.5495 - precision: 0.5238 - recall: 0.5906 - auc: 0.5514 SVM K=1: binary_accuracy: 0.5399 - precision: 0.5180 - recall: 0.4832 - auc: 0.5373 SVM K=2: binary_accuracy: 0.5304 - precision: 0.5065 - recall: 0.5235 - auc: 0.5300 SVM K=3: binary_accuracy: 0.5240 - precision: 0.5000 - recall: 0.5168 - auc: 0.5236 SVM K=4: binary_accuracy: 0.5304 - precision: 0.5066 - recall: 0.5168 - auc: 0.5297
Complete case 4 - with embeddings MLP K=0: binary_accuracy: 0.7668 - precision: 0.7754 - recall: 0.7181 - auc: 0.7645 MLP K=1: binary_accuracy: 0.7668 - precision: 0.7603 - recall: 0.7450 - auc: 0.7658 MLP K=2: binary_accuracy: 0.7859 - precision: 0.7808 - recall: 0.7651 - auc: 0.7850 MLP K=3: binary_accuracy: 0.7636 - precision: 0.7389 - recall: 0.7785 - auc: 0.7643 MLP K=4: binary_accuracy: 0.8179 - precision: 0.7949 - recall: 0.8322 - auc: 0.8185 XGB K=0: binary_accuracy: 0.7029 - precision: 0.7029 - recall: 0.6510 - auc: 0.7005 XGB K=1: binary_accuracy: 0.7061 - precision: 0.7021 - recall: 0.6644 - auc: 0.7042 XGB K=2: binary_accuracy: 0.6773 - precision: 0.6714 - recall: 0.6309 - auc: 0.6752 XGB K=3: binary_accuracy: 0.6901 - precision: 0.6733 - recall: 0.6779 - auc: 0.6895 XGB K=4: binary_accuracy: 0.7061 - precision: 0.7050 - recall: 0.6577 - auc: 0.7039 RF K=0: binary_accuracy: 0.6581 - precision: 0.6780 - recall: 0.5369 - auc: 0.6526 RF K=1: binary_accuracy: 0.7029 - precision: 0.7059 - recall: 0.6443 - auc: 0.7002 RF K=2: binary_accuracy: 0.6773 - precision: 0.6739 - recall: 0.6242 - auc: 0.6749 RF K=3: binary_accuracy: 0.7252 - precision: 0.7143 - recall: 0.7047 - auc: 0.7243 RF K=4: binary_accuracy: 0.7316 - precision: 0.7559 - recall: 0.6443 - auc: 0.7276 SVM K=0: binary_accuracy: 0.7476 - precision: 0.7612 - recall: 0.6846 - auc: 0.7447 SVM K=1: binary_accuracy: 0.7668 - precision: 0.7603 - recall: 0.7450 - auc: 0.7658 SVM K=2: binary_accuracy: 0.7636 - precision: 0.7737 - recall: 0.7114 - auc: 0.7612 SVM K=3: binary_accuracy: 0.7859 - precision: 0.7697 - recall: 0.7852 - auc: 0.7859 SVM K=4: binary_accuracy: 0.8051 - precision: 0.8014 - recall: 0.7852 - auc: 0.8042
Complete case 5 - with counts MLP K=0: binary_accuracy: 0.5783 - precision: 0.5503 - recall: 0.6242 - auc: 0.5804 MLP K=1: binary_accuracy: 0.5847 - precision: 0.5629 - recall: 0.5705 - auc: 0.5840 MLP K=2: binary_accuracy: 0.5144 - precision: 0.4892 - recall: 0.4564 - auc: 0.5117 MLP K=3: binary_accuracy: 0.5559 - precision: 0.5309 - recall: 0.5772 - auc: 0.5569 MLP K=4: binary_accuracy: 0.5687 - precision: 0.5473 - recall: 0.5436 - auc: 0.5675 XGB K=0: binary_accuracy: 0.5815 - precision: 0.5592 - recall: 0.5705 - auc: 0.5810 XGB K=1: binary_accuracy: 0.5687 - precision: 0.5479 - recall: 0.5369 - auc: 0.5672 XGB K=2: binary_accuracy: 0.5016 - precision: 0.4777 - recall: 0.5034 - auc: 0.5017 XGB K=3: binary_accuracy: 0.5751 - precision: 0.5488 - recall: 0.6040 - auc: 0.5764 XGB K=4: binary_accuracy: 0.6006 - precision: 0.5896 - recall: 0.5302 - auc: 0.5974 RF K=0: binary_accuracy: 0.5815 - precision: 0.5682 - recall: 0.5034 - auc: 0.5779 RF K=1: binary_accuracy: 0.5463 - precision: 0.5289 - recall: 0.4295 - auc: 0.5410 RF K=2: binary_accuracy: 0.5272 - precision: 0.5033 - recall: 0.5168 - auc: 0.5267 RF K=3: binary_accuracy: 0.5655 - precision: 0.5461 - recall: 0.5168 - auc: 0.5633 RF K=4: binary_accuracy: 0.6134 - precision: 0.6148 - recall: 0.5034 - auc: 0.6084 SVM K=0: binary_accuracy: 0.5687 - precision: 0.5515 - recall: 0.5034 - auc: 0.5657 SVM K=1: binary_accuracy: 0.5751 - precision: 0.5548 - recall: 0.5436 - auc: 0.5736 SVM K=2: binary_accuracy: 0.5304 - precision: 0.5066 - recall: 0.5168 - auc: 0.5297 SVM K=3: binary_accuracy: 0.5495 - precision: 0.5282 - recall: 0.5034 - auc: 0.5474 SVM K=4: binary_accuracy: 0.5655 - precision: 0.5430 - recall: 0.5503 - auc: 0.5648
\ No newline at end of file
Fall case 1 - without columns MLP K=0: binary_accuracy: 0.7716 - precision: 0.4405 - recall: 0.6449 - auc: 0.7235 MLP K=1: binary_accuracy: 0.7684 - precision: 0.4349 - recall: 0.6367 - auc: 0.7185 MLP K=2: binary_accuracy: 0.7742 - precision: 0.4466 - recall: 0.6707 - auc: 0.7350 MLP K=3: binary_accuracy: 0.7856 - precision: 0.4626 - recall: 0.6313 - auc: 0.7271 MLP K=4: binary_accuracy: 0.7650 - precision: 0.4328 - recall: 0.6707 - auc: 0.7292 XGB K=0: binary_accuracy: 0.7335 - precision: 0.3929 - recall: 0.6789 - auc: 0.7128 XGB K=1: binary_accuracy: 0.7459 - precision: 0.4026 - recall: 0.6327 - auc: 0.7030 XGB K=2: binary_accuracy: 0.7478 - precision: 0.4138 - recall: 0.7116 - auc: 0.7340 XGB K=3: binary_accuracy: 0.7271 - precision: 0.3869 - recall: 0.6884 - auc: 0.7124 XGB K=4: binary_accuracy: 0.7409 - precision: 0.3987 - recall: 0.6531 - auc: 0.7076 RF K=0: binary_accuracy: 0.7687 - precision: 0.4216 - recall: 0.5088 - auc: 0.6701 RF K=1: binary_accuracy: 0.7612 - precision: 0.4056 - recall: 0.4884 - auc: 0.6578 RF K=2: binary_accuracy: 0.7848 - precision: 0.4545 - recall: 0.5306 - auc: 0.6884 RF K=3: binary_accuracy: 0.7663 - precision: 0.4204 - recall: 0.5320 - auc: 0.6774 RF K=4: binary_accuracy: 0.7626 - precision: 0.4140 - recall: 0.5306 - auc: 0.6746 SVM K=0: binary_accuracy: 0.7996 - precision: 0.4873 - recall: 0.5728 - auc: 0.7136 SVM K=1: binary_accuracy: 0.7970 - precision: 0.4816 - recall: 0.5701 - auc: 0.7109 SVM K=2: binary_accuracy: 0.8081 - precision: 0.5058 - recall: 0.5973 - auc: 0.7281 SVM K=3: binary_accuracy: 0.8065 - precision: 0.5023 - recall: 0.5918 - auc: 0.7251 SVM K=4: binary_accuracy: 0.7978 - precision: 0.4840 - recall: 0.5973 - auc: 0.7217
Fall case 2 - with one-hot-encoding MLP K=0: binary_accuracy: 0.8258 - precision: 0.5424 - recall: 0.6694 - auc: 0.7665 MLP K=1: binary_accuracy: 0.8137 - precision: 0.5164 - recall: 0.6626 - auc: 0.7564 MLP K=2: binary_accuracy: 0.8280 - precision: 0.5451 - recall: 0.6993 - auc: 0.7792 MLP K=3: binary_accuracy: 0.8232 - precision: 0.5388 - recall: 0.6327 - auc: 0.7509 MLP K=4: binary_accuracy: 0.8256 - precision: 0.5429 - recall: 0.6544 - auc: 0.7607 XGB K=0: binary_accuracy: 0.8348 - precision: 0.5622 - recall: 0.6830 - auc: 0.7773 XGB K=1: binary_accuracy: 0.8282 - precision: 0.5491 - recall: 0.6544 - auc: 0.7623 XGB K=2: binary_accuracy: 0.8425 - precision: 0.5780 - recall: 0.7061 - auc: 0.7908 XGB K=3: binary_accuracy: 0.8277 - precision: 0.5474 - recall: 0.6599 - auc: 0.7640 XGB K=4: binary_accuracy: 0.8375 - precision: 0.5701 - recall: 0.6694 - auc: 0.7737 RF K=0: binary_accuracy: 0.8708 - precision: 0.7140 - recall: 0.5605 - auc: 0.7532 RF K=1: binary_accuracy: 0.8592 - precision: 0.6741 - recall: 0.5347 - auc: 0.7361 RF K=2: binary_accuracy: 0.8777 - precision: 0.7382 - recall: 0.5755 - auc: 0.7631 RF K=3: binary_accuracy: 0.8642 - precision: 0.6907 - recall: 0.5469 - auc: 0.7439 RF K=4: binary_accuracy: 0.8650 - precision: 0.7027 - recall: 0.5306 - auc: 0.7382 SVM K=0: binary_accuracy: 0.8240 - precision: 0.5398 - recall: 0.6463 - auc: 0.7566 SVM K=1: binary_accuracy: 0.8240 - precision: 0.5437 - recall: 0.5918 - auc: 0.7359 SVM K=2: binary_accuracy: 0.8282 - precision: 0.5490 - recall: 0.6558 - auc: 0.7628 SVM K=3: binary_accuracy: 0.8311 - precision: 0.5627 - recall: 0.5918 - auc: 0.7404 SVM K=4: binary_accuracy: 0.8232 - precision: 0.5401 - recall: 0.6136 - auc: 0.7437
Fall case 3 - with catboost-encoding MLP K=0: binary_accuracy: 0.7827 - precision: 0.4618 - recall: 0.7075 - auc: 0.7542 MLP K=1: binary_accuracy: 0.7808 - precision: 0.4565 - recall: 0.6639 - auc: 0.7365 MLP K=2: binary_accuracy: 0.8007 - precision: 0.4916 - recall: 0.7197 - auc: 0.7700 MLP K=3: binary_accuracy: 0.7922 - precision: 0.4763 - recall: 0.6844 - auc: 0.7513 MLP K=4: binary_accuracy: 0.7573 - precision: 0.4273 - recall: 0.7279 - auc: 0.7461 XGB K=0: binary_accuracy: 0.8507 - precision: 0.6227 - recall: 0.5905 - auc: 0.7520 XGB K=1: binary_accuracy: 0.8377 - precision: 0.5884 - recall: 0.5524 - auc: 0.7295 XGB K=2: binary_accuracy: 0.8536 - precision: 0.6289 - recall: 0.6041 - auc: 0.7590 XGB K=3: binary_accuracy: 0.8433 - precision: 0.5989 - recall: 0.5891 - auc: 0.7469 XGB K=4: binary_accuracy: 0.8441 - precision: 0.6096 - recall: 0.5524 - auc: 0.7335 RF K=0: binary_accuracy: 0.8849 - precision: 0.8807 - recall: 0.4721 - auc: 0.7283 RF K=1: binary_accuracy: 0.8819 - precision: 0.8658 - recall: 0.4653 - auc: 0.7239 RF K=2: binary_accuracy: 0.8883 - precision: 0.8903 - recall: 0.4857 - auc: 0.7356 RF K=3: binary_accuracy: 0.8817 - precision: 0.8711 - recall: 0.4599 - auc: 0.7217 RF K=4: binary_accuracy: 0.8801 - precision: 0.8832 - recall: 0.4422 - auc: 0.7140 SVM K=0: binary_accuracy: 0.8092 - precision: 0.5073 - recall: 0.6585 - auc: 0.7520 SVM K=1: binary_accuracy: 0.7988 - precision: 0.4870 - recall: 0.6381 - auc: 0.7379 SVM K=2: binary_accuracy: 0.8166 - precision: 0.5216 - recall: 0.6912 - auc: 0.7690 SVM K=3: binary_accuracy: 0.8152 - precision: 0.5200 - recall: 0.6558 - auc: 0.7548 SVM K=4: binary_accuracy: 0.7909 - precision: 0.4737 - recall: 0.6735 - auc: 0.7464
Fall case 4 - with embeddings MLP K=0: binary_accuracy: 0.8338 - precision: 0.5563 - recall: 0.7197 - auc: 0.7905 MLP K=1: binary_accuracy: 0.8343 - precision: 0.5583 - recall: 0.7102 - auc: 0.7872 MLP K=2: binary_accuracy: 0.8377 - precision: 0.5604 - recall: 0.7701 - auc: 0.8121 MLP K=3: binary_accuracy: 0.8415 - precision: 0.5739 - recall: 0.7184 - auc: 0.7948 MLP K=4: binary_accuracy: 0.8393 - precision: 0.5699 - recall: 0.7102 - auc: 0.7904 XGB K=0: binary_accuracy: 0.8462 - precision: 0.5895 - recall: 0.6898 - auc: 0.7869 XGB K=1: binary_accuracy: 0.8385 - precision: 0.5752 - recall: 0.6503 - auc: 0.7672 XGB K=2: binary_accuracy: 0.8502 - precision: 0.5951 - recall: 0.7197 - auc: 0.8007 XGB K=3: binary_accuracy: 0.8359 - precision: 0.5671 - recall: 0.6612 - auc: 0.7697 XGB K=4: binary_accuracy: 0.8396 - precision: 0.5762 - recall: 0.6639 - auc: 0.7730 RF K=0: binary_accuracy: 0.8764 - precision: 0.7310 - recall: 0.5769 - auc: 0.7628 RF K=1: binary_accuracy: 0.8632 - precision: 0.6968 - recall: 0.5252 - auc: 0.7350 RF K=2: binary_accuracy: 0.8814 - precision: 0.7567 - recall: 0.5755 - auc: 0.7654 RF K=3: binary_accuracy: 0.8653 - precision: 0.6922 - recall: 0.5537 - auc: 0.7471 RF K=4: binary_accuracy: 0.8708 - precision: 0.7194 - recall: 0.5510 - auc: 0.7495 SVM K=0: binary_accuracy: 0.8362 - precision: 0.5614 - recall: 0.7211 - auc: 0.7925 SVM K=1: binary_accuracy: 0.8370 - precision: 0.5669 - recall: 0.6857 - auc: 0.7796 SVM K=2: binary_accuracy: 0.8364 - precision: 0.5592 - recall: 0.7524 - auc: 0.8046 SVM K=3: binary_accuracy: 0.8409 - precision: 0.5735 - recall: 0.7116 - auc: 0.7919 SVM K=4: binary_accuracy: 0.8351 - precision: 0.5606 - recall: 0.7048 - auc: 0.7857
Fall case 5 - with counts MLP K=0: binary_accuracy: 0.8250 - precision: 0.5374 - recall: 0.7238 - auc: 0.7867 MLP K=1: binary_accuracy: 0.8203 - precision: 0.5289 - recall: 0.6980 - auc: 0.7739 MLP K=2: binary_accuracy: 0.8359 - precision: 0.5600 - recall: 0.7306 - auc: 0.7960 MLP K=3: binary_accuracy: 0.8298 - precision: 0.5495 - recall: 0.6952 - auc: 0.7788 MLP K=4: binary_accuracy: 0.8340 - precision: 0.5583 - recall: 0.7034 - auc: 0.7845 XGB K=0: binary_accuracy: 0.8454 - precision: 0.5863 - recall: 0.6980 - auc: 0.7895 XGB K=1: binary_accuracy: 0.8393 - precision: 0.5727 - recall: 0.6857 - auc: 0.7811 XGB K=2: binary_accuracy: 0.8499 - precision: 0.5915 - recall: 0.7388 - auc: 0.8078 XGB K=3: binary_accuracy: 0.8473 - precision: 0.5888 - recall: 0.7129 - auc: 0.7963 XGB K=4: binary_accuracy: 0.8483 - precision: 0.5946 - recall: 0.6925 - auc: 0.7892 RF K=0: binary_accuracy: 0.8796 - precision: 0.7439 - recall: 0.5810 - auc: 0.7663 RF K=1: binary_accuracy: 0.8700 - precision: 0.7103 - recall: 0.5605 - auc: 0.7527 RF K=2: binary_accuracy: 0.8864 - precision: 0.7703 - recall: 0.5932 - auc: 0.7752 RF K=3: binary_accuracy: 0.8719 - precision: 0.7131 - recall: 0.5714 - auc: 0.7579 RF K=4: binary_accuracy: 0.8748 - precision: 0.7356 - recall: 0.5565 - auc: 0.7541 SVM K=0: binary_accuracy: 0.8441 - precision: 0.5877 - recall: 0.6653 - auc: 0.7763 SVM K=1: binary_accuracy: 0.8362 - precision: 0.5716 - recall: 0.6299 - auc: 0.7579 SVM K=2: binary_accuracy: 0.8489 - precision: 0.5958 - recall: 0.6939 - auc: 0.7901 SVM K=3: binary_accuracy: 0.8311 - precision: 0.5559 - recall: 0.6558 - auc: 0.7646 SVM K=4: binary_accuracy: 0.8391 - precision: 0.5773 - recall: 0.6449 - auc: 0.7654
\ No newline at end of file
#!/usr/bin/env python
import numpy as np
import config as cfg
from tools import file_reader, preprocessor
from sklearn.model_selection import train_test_split
import xgboost as xgb
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score, precision_score, recall_score
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
USE_BALANCING = True
def main():
# Case 1
print("Case 1 - without columns")
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR, 'complete.csv')
ex_cols = [str(i)+'Ex' for i in range(1,10)]
ats_cols = [str(i)+'Ats' for i in range(1,11)]
cat_cols = ex_cols + ats_cols
X = df.drop(['Complete'], axis=1)
y = df['Complete']
X = X.drop(cat_cols, axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
stratify=y, random_state=0)
train_nn(X_train, X_test, y_train, y_test)
train_xgboost(X_train, X_test, y_train, y_test)
train_random_forest(X_train, X_test, y_train, y_test)
train_svm(X_train, X_test, y_train, y_test)
# Case 2
print("\nCase 2 - with one-hot-encoding")
converters = {str(i)+'Ats':str for i in range(1,11)}
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR, 'complete.csv', converters=converters)
X = df.drop(['Complete'], axis=1)
y = df['Complete']
X = preprocessor.encode_vector_onehot(X, cat_cols)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
stratify=y, random_state=0)
train_nn(X_train, X_test, y_train, y_test)
train_xgboost(X_train, X_test, y_train, y_test)
train_random_forest(X_train, X_test, y_train, y_test)
train_svm(X_train, X_test, y_train, y_test)
# Case 3
print("\nCase 3 - with catboost-encoding")
converters = {str(i)+'Ats':str for i in range(1,11)}
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR, 'complete.csv', converters=converters)
X = df.drop(['Complete'], axis=1)
y = df['Complete']
X = preprocessor.encode_vector_catboost(X, y, cat_cols)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
stratify=y, random_state=0)
train_nn(X_train, X_test, y_train, y_test)
train_xgboost(X_train, X_test, y_train, y_test)
train_random_forest(X_train, X_test, y_train, y_test)
train_svm(X_train, X_test, y_train, y_test)
# Case 4
print("\nCase 4 - with embeddings")
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR, 'complete_with_embeddings.csv')
X = df.drop(['Complete'], axis=1)
y = df['Complete']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
stratify=y, random_state=0)
train_nn(X_train, X_test, y_train, y_test)
train_xgboost(X_train, X_test, y_train, y_test)
train_random_forest(X_train, X_test, y_train, y_test)
train_svm(X_train, X_test, y_train, y_test)
# Case 5
print("\nCase 5 - with counts")
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR, 'complete_with_count.csv')
X = df.drop(['Complete'], axis=1)
y = df['Complete']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
stratify=y, random_state=0)
train_nn(X_train, X_test, y_train, y_test)
train_xgboost(X_train, X_test, y_train, y_test)
train_random_forest(X_train, X_test, y_train, y_test)
train_svm(X_train, X_test, y_train, y_test)
def train_nn(X_train, X_test, y_train, y_test):
if USE_BALANCING:
neg, pos = np.bincount(y_train)
class_weight = preprocessor.get_class_weights(neg, pos)
else:
class_weight = None
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc = scaler.transform(X_test)
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu',
input_shape=(X_train_sc.shape[-1],)),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(1, activation='sigmoid')
])
metrics = [
tf.keras.metrics.BinaryAccuracy(name='accuracy'),
tf.keras.metrics.Precision(name='precision'),
tf.keras.metrics.Recall(name='recall'),
tf.keras.metrics.AUC(name='auc'),
]
model.compile(
optimizer=tf.keras.optimizers.Adam(lr=1e-3),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=metrics)
EPOCHS = 20
if USE_BALANCING:
model.fit(X_train_sc, y_train, epochs=EPOCHS,
class_weight=class_weight,
verbose=0)
else:
model.fit(X_train_sc, y_train, epochs=EPOCHS,
verbose=0)
results = model.evaluate(X_test_sc, y_test, verbose=0)
print("TF Results:")
for name, value in zip(model.metrics_names, results):
print(name, ': ', round(value, 3))
print()
def train_xgboost(X_train, X_test, y_train, y_test):
if USE_BALANCING:
neg, pos = np.bincount(y_train)
scale_pos_weight = neg / pos
params = {"objective": "binary:logistic",
"scale_pos_weight": scale_pos_weight,
"seed": 0 }
else:
params = {"objective": "binary:logistic",
"seed": 0 }
model = xgb.XGBClassifier(**params)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
print(f"XGB Accuracy: {round(accuracy_score(y_test, predictions), 3)}")
print(f"XGB ROC AUC score: {round(roc_auc_score(y_test, model.predict_proba(X_test)[:,1]), 3)}")
print(f"XGB Pre. score: {round(precision_score(y_test, predictions), 3)}")
print(f"XGB Recall score: {round(recall_score(y_test, predictions), 3)}")
def train_random_forest(X_train, X_test, y_train, y_test):
if USE_BALANCING:
model = RandomForestClassifier(random_state=0,
class_weight="balanced")
else:
model = RandomForestClassifier(random_state=0)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print(f"RF Accuracy: {round(accuracy_score(y_test, predictions), 3)}")
print(f"RF ROC AUC score: {round(roc_auc_score(y_test, model.predict_proba(X_test)[:,1]), 3)}")
print(f"RF Pre. score: {round(precision_score(y_test, predictions), 3)}")
print(f"RF Recall score: {round(recall_score(y_test, predictions), 3)}")
def train_svm(X_train, X_test, y_train, y_test):
if USE_BALANCING:
model = SVC(random_state=0,
class_weight="balanced",
probability=True)
else:
model = SVC(random_state=0,
probability=True)
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc = scaler.transform(X_test)
model.fit(X_train_sc, y_train)
predictions = model.predict(X_test_sc)
print(f"SVM Accuracy: {round(accuracy_score(y_test, predictions), 3)}")
print(f"SVM ROC AUC score: {round(roc_auc_score(y_test, model.predict_proba(X_test_sc)[:,1]), 3)}")
print(f"SVM Pre. score: {round(precision_score(y_test, predictions), 3)}")
print(f"SVM Recall score: {round(recall_score(y_test, predictions), 3)}")
train_metrics = [tf.keras.metrics.SparseCategoricalAccuracy()]
for metric in train_metrics:
metric(y_test, [predictions])
print_scores("SVM", train_metrics)
def print_scores(classifer_name, metrics):
metrics = " - ".join(["{}: {:.4f}".format(m.name, m.result())
for m in (metrics or [])])
print("\r{}/{} - ".format(classifer_name) + metrics + "\n")
if __name__ == '__main__':
main()
#!/usr/bin/env python
import numpy as np
import config as cfg
from tools import file_reader, preprocessor
from sklearn.model_selection import train_test_split
import xgboost as xgb
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score, precision_score, recall_score
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from pathlib import Path
USE_BALANCING = True
NUM_ITERATIONS = 5
CASE = "Fall"
OUTPUT_FILENAME = f"{CASE} baseline results.txt"
METRICS = [tf.keras.metrics.BinaryAccuracy(),
tf.keras.metrics.Precision(),
tf.keras.metrics.Recall(),
tf.keras.metrics.AUC()]
EX_COLS = [str(i)+'Ex' for i in range(1,10)]
ATS_COLS = [str(i)+'Ats' for i in range(1,11)]
def load_complete():
ex = {str(i)+'Ex':str for i in range(1,10)}
ats = {str(i)+'Ats':str for i in range(1,11)}
converters = {**ex, **ats}
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR,
'complete.csv',
converters=converters)
return df
def load_fall():
converters = {str(i)+'Ats':str for i in range(1,11)}
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR,
'fall.csv',
converters=converters)
return df
def main():
# Case 1
text = f"{CASE} case 1 - without columns"
with open(Path.joinpath(cfg.REPORTS_DIR, OUTPUT_FILENAME), "w+") as text_file:
text_file.write(text)
if CASE == "Complete":
df = load_complete()
X = df.drop(['Complete'], axis=1)
y = df['Complete']
cat_cols = EX_COLS + ATS_COLS
X = X.drop(cat_cols, axis=1)
else:
df = load_fall()
X = df.drop(['Fall'], axis=1)
y = df['Fall']
cat_cols = ATS_COLS
X = X.drop(cat_cols, axis=1)
train_nn(X, y)
train_xgboost(X, y)
train_random_forest(X, y)
train_svm(X, y)
# Case 2
text = f"{CASE} case 2 - with one-hot-encoding"
with open(Path.joinpath(cfg.REPORTS_DIR, OUTPUT_FILENAME), "a") as text_file:
text_file.write("\n\n")
text_file.write(text)
if CASE == "Complete":
df = load_complete()
X = df.drop(['Complete'], axis=1)
y = df['Complete']
cat_cols = EX_COLS + ATS_COLS
else:
df = load_fall()
X = df.drop(['Fall'], axis=1)
y = df['Fall']
cat_cols = ATS_COLS
X = preprocessor.encode_vector_onehot(X, cat_cols)
train_nn(X, y)
train_xgboost(X, y)
train_random_forest(X, y)
train_svm(X, y)
# Case 3
text = f"{CASE} case 3 - with catboost-encoding"
with open(Path.joinpath(cfg.REPORTS_DIR, OUTPUT_FILENAME), "a") as text_file:
text_file.write("\n\n")
text_file.write(text)
if CASE == "Complete":
df = load_complete()
X = df.drop(['Complete'], axis=1)
y = df['Complete']
cat_cols = EX_COLS + ATS_COLS
else:
df = load_fall()
X = df.drop(['Fall'], axis=1)
y = df['Fall']
cat_cols = ATS_COLS
X = preprocessor.encode_vector_catboost(X, y, cat_cols)
train_nn(X, y)
train_xgboost(X, y)
train_random_forest(X, y)
train_svm(X, y)
# Case 4
text = f"{CASE} case 4 - with embeddings"
with open(Path.joinpath(cfg.REPORTS_DIR, OUTPUT_FILENAME), "a") as text_file:
text_file.write("\n\n")
text_file.write(text)
if CASE == "Complete":
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR, 'complete_with_embeddings.csv')
X = df.drop(['Complete'], axis=1)
y = df['Complete']
else:
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR, 'fall_with_embeddings.csv')
X = df.drop(['Fall'], axis=1)
y = df['Fall']
train_nn(X, y)
train_xgboost(X, y)
train_random_forest(X, y)
train_svm(X, y)
# Case 5
text = f"{CASE} case 5 - with counts"
with open(Path.joinpath(cfg.REPORTS_DIR, OUTPUT_FILENAME), "a") as text_file:
text_file.write("\n\n")
text_file.write(text)
if CASE == "Complete":
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR, 'complete_with_count.csv')
X = df.drop(['Complete'], axis=1)
y = df['Complete']
else:
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR, 'fall_with_count.csv')
X = df.drop(['Fall'], axis=1)
y = df['Fall']
train_nn(X, y)
train_xgboost(X, y)
train_random_forest(X, y)
train_svm(X, y)
def train_nn(X, y):
for k in range(NUM_ITERATIONS):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
stratify=y, random_state=k)
if USE_BALANCING:
neg, pos = np.bincount(y_train)
class_weight = preprocessor.get_class_weights(neg, pos)
else:
class_weight = None
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc = scaler.transform(X_test)
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu',
input_shape=(X_train_sc.shape[-1],)),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(
optimizer=tf.keras.optimizers.Adam(lr=1e-3),
loss=tf.keras.losses.BinaryCrossentropy())
if USE_BALANCING:
model.fit(X_train_sc, y_train, epochs=20,
class_weight=class_weight,
verbose=0)
else:
model.fit(X_train_sc, y_train, epochs=20,
verbose=0)
y_pred = model.predict(X_test_sc)
y_test = np.array(y_test).reshape(-1,1)
y_pred = (y_pred > 0.5).reshape(-1,1)
make_and_print_scores("MLP", k, y_test, y_pred)
def train_xgboost(X, y):
for k in range(NUM_ITERATIONS):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
stratify=y, random_state=k)
if USE_BALANCING:
neg, pos = np.bincount(y_train)
scale_pos_weight = neg / pos
params = {"objective": "binary:logistic",
"scale_pos_weight": scale_pos_weight,
"seed": 0 }
else:
params = {"objective": "binary:logistic",
"seed": 0 }
model = xgb.XGBClassifier(**params)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
y_test = np.array(y_test).reshape(-1,1)
y_pred = predictions.reshape(-1,1)
make_and_print_scores("XGB", k, y_test, y_pred)
def train_random_forest(X, y):
for k in range(NUM_ITERATIONS):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
stratify=y, random_state=k)
if USE_BALANCING:
model = RandomForestClassifier(random_state=0,
class_weight="balanced")
else:
model = RandomForestClassifier(random_state=0)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
y_test = np.array(y_test).reshape(-1,1)
y_pred = predictions.reshape(-1,1)
make_and_print_scores("RF", k, y_test, y_pred)
def train_svm(X, y):
for k in range(NUM_ITERATIONS):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
stratify=y, random_state=k)
if USE_BALANCING:
model = SVC(random_state=0,
class_weight="balanced",
probability=True)
else:
model = SVC(random_state=0,
probability=True)
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc = scaler.transform(X_test)
model.fit(X_train_sc, y_train)
predictions = model.predict(X_test_sc)
y_test = np.array(y_test).reshape(-1,1)
y_pred = predictions.reshape(-1,1)
make_and_print_scores("SVM", k, y_test, y_pred)
def make_and_print_scores(classifer_name, k, y_test, y_pred):
for metric in METRICS:
metric(y_test, y_pred)
metrics = " - ".join(["{}: {:.4f}".format(m.name, m.result())
for m in (METRICS or [])])
text ="\r{} K={}: {}".format(classifer_name, k, metrics)
with open(Path.joinpath(cfg.REPORTS_DIR, OUTPUT_FILENAME), "a") as text_file:
text_file.write(text)
for metric in METRICS:
metric.reset_states()
if __name__ == '__main__':
main()
#!/usr/bin/env python
import numpy as np
import config as cfg
from tools import file_reader, preprocessor
from sklearn.model_selection import train_test_split
import xgboost as xgb
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score, precision_score, recall_score
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
USE_BALANCING = True
def main():
# Case 1
print("Case 1 - without columns")
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR, 'fall.csv')
ats_cols = [str(i)+'Ats' for i in range(1,11)]
X = df.drop(['Fall'], axis=1)
y = df['Fall']
X = X.drop(ats_cols, axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
stratify=y, random_state=0)
train_nn(X_train, X_test, y_train, y_test)
train_xgboost(X_train, X_test, y_train, y_test)
train_random_forest(X_train, X_test, y_train, y_test)
train_svm(X_train, X_test, y_train, y_test)
# Case 2
print("\nCase 2 - with one-hot-encoding")
converters = {str(i)+'Ats':str for i in range(1,11)}
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR, 'fall.csv', converters=converters)
X = df.drop(['Fall'], axis=1)
y = df['Fall']
X = preprocessor.encode_vector_onehot(X, ats_cols)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
stratify=y, random_state=0)
train_nn(X_train, X_test, y_train, y_test)
train_xgboost(X_train, X_test, y_train, y_test)
train_random_forest(X_train, X_test, y_train, y_test)
train_svm(X_train, X_test, y_train, y_test)
# Case 3
print("\nCase 3 - with catboost-encoding")
converters = {str(i)+'Ats':str for i in range(1,11)}
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR, 'fall.csv', converters=converters)
X = df.drop(['Fall'], axis=1)
y = df['Fall']
X = preprocessor.encode_vector_catboost(X, y, ats_cols)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
stratify=y, random_state=0)
train_nn(X_train, X_test, y_train, y_test)
train_xgboost(X_train, X_test, y_train, y_test)
train_random_forest(X_train, X_test, y_train, y_test)
train_svm(X_train, X_test, y_train, y_test)
# Case 4
print("\nCase 4 - with embeddings")
df = file_reader.read_csv(cfg.PROCESSED_DATA_DIR, 'fall_with_embeddings.csv')
X = df.drop(['Fall'], axis=1)
y = df['Fall']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
stratify=y, random_state=0)