[edit]
Logistic Model Trees with AUC Split Criterion for the KDD Cup 2009 Small Challenge
Patrick Doetsch, Christian Buck, Pavlo Golik, Niklas Hoppe, Michael Kramp, Johannes Laudenberg, Christian Oberdörfer, Pascal Steingrube, Jens Forster, Arne MauserProceedings of KDD-Cup 2009 Competition, PMLR 7:77-88, 2009.
Abstract
In this work, we describe our approach to the “Small Challenge” of the KDD cup 2009, a classification task with incomplete data. Preprocessing, feature extraction and model selection are documented in detail. We suggest a criterion based on the number of missing values to select a suitable imputation method for each feature. Logistic Model Trees (LMT) are extended with a split criterion optimizing the Area under the ROC Curve (AUC), which was the requested evaluation criterion. By stacking boosted decision stumps and LMT we achieved the best result for the “Small Challenge” without making use of additional data from other feature sets, resulting in an AUC score of 0.8081. We also present results of an AUC optimizing model combination that scored only slightly worse with an AUC score of 0.8074.
Cite this Paper
BibTeX
@InProceedings{pmlr-v7-doetsch09, title = {Logistic Model Trees with AUC Split Criterion for the KDD Cup 2009 Small Challenge}, author = {Doetsch, Patrick and Buck, Christian and Golik, Pavlo and Hoppe, Niklas and Kramp, Michael and Laudenberg, Johannes and Oberdörfer, Christian and Steingrube, Pascal and Forster, Jens and Mauser, Arne}, booktitle = {Proceedings of KDD-Cup 2009 Competition}, pages = {77--88}, year = {2009}, editor = {Dror, Gideon and Boullé, Mar and Guyon, Isabelle and Lemaire, Vincent and Vogel, David}, volume = {7}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v7/doetsch09/doetsch09.pdf}, url = {https://proceedings.mlr.press/v7/doetsch09.html}, abstract = {In this work, we describe our approach to the “Small Challenge” of the KDD cup 2009, a classification task with incomplete data. Preprocessing, feature extraction and model selection are documented in detail. We suggest a criterion based on the number of missing values to select a suitable imputation method for each feature. Logistic Model Trees (LMT) are extended with a split criterion optimizing the Area under the ROC Curve (AUC), which was the requested evaluation criterion. By stacking boosted decision stumps and LMT we achieved the best result for the “Small Challenge” without making use of additional data from other feature sets, resulting in an AUC score of 0.8081. We also present results of an AUC optimizing model combination that scored only slightly worse with an AUC score of 0.8074.}}
Endnote
%0 Conference Paper%T Logistic Model Trees with AUC Split Criterion for the KDD Cup 2009 Small Challenge%A Patrick Doetsch%A Christian Buck%A Pavlo Golik%A Niklas Hoppe%A Michael Kramp%A Johannes Laudenberg%A Christian Oberdörfer%A Pascal Steingrube%A Jens Forster%A Arne Mauser%B Proceedings of KDD-Cup 2009 Competition%C Proceedings of Machine Learning Research%D 2009%E Gideon Dror%E Mar Boullé%E Isabelle Guyon%E Vincent Lemaire%E David Vogel%F pmlr-v7-doetsch09%I PMLR%P 77--88%U https://proceedings.mlr.press/v7/doetsch09.html%V 7%X In this work, we describe our approach to the “Small Challenge” of the KDD cup 2009, a classification task with incomplete data. Preprocessing, feature extraction and model selection are documented in detail. We suggest a criterion based on the number of missing values to select a suitable imputation method for each feature. Logistic Model Trees (LMT) are extended with a split criterion optimizing the Area under the ROC Curve (AUC), which was the requested evaluation criterion. By stacking boosted decision stumps and LMT we achieved the best result for the “Small Challenge” without making use of additional data from other feature sets, resulting in an AUC score of 0.8081. We also present results of an AUC optimizing model combination that scored only slightly worse with an AUC score of 0.8074.
RIS
TY - CPAPERTI - Logistic Model Trees with AUC Split Criterion for the KDD Cup 2009 Small ChallengeAU - Patrick DoetschAU - Christian BuckAU - Pavlo GolikAU - Niklas HoppeAU - Michael KrampAU - Johannes LaudenbergAU - Christian OberdörferAU - Pascal SteingrubeAU - Jens ForsterAU - Arne MauserBT - Proceedings of KDD-Cup 2009 CompetitionDA - 2009/12/04ED - Gideon DrorED - Mar BoulléED - Isabelle GuyonED - Vincent LemaireED - David VogelID - pmlr-v7-doetsch09PB - PMLRDP - Proceedings of Machine Learning ResearchVL - 7SP - 77EP - 88L1 - http://proceedings.mlr.press/v7/doetsch09/doetsch09.pdfUR - https://proceedings.mlr.press/v7/doetsch09.htmlAB - In this work, we describe our approach to the “Small Challenge” of the KDD cup 2009, a classification task with incomplete data. Preprocessing, feature extraction and model selection are documented in detail. We suggest a criterion based on the number of missing values to select a suitable imputation method for each feature. Logistic Model Trees (LMT) are extended with a split criterion optimizing the Area under the ROC Curve (AUC), which was the requested evaluation criterion. By stacking boosted decision stumps and LMT we achieved the best result for the “Small Challenge” without making use of additional data from other feature sets, resulting in an AUC score of 0.8081. We also present results of an AUC optimizing model combination that scored only slightly worse with an AUC score of 0.8074.ER -
APA
Doetsch, P., Buck, C., Golik, P., Hoppe, N., Kramp, M., Laudenberg, J., Oberdörfer, C., Steingrube, P., Forster, J. & Mauser, A.. (2009). Logistic Model Trees with AUC Split Criterion for the KDD Cup 2009 Small Challenge.Proceedings of KDD-Cup 2009 Competition, inProceedings of Machine Learning Research 7:77-88 Available from https://proceedings.mlr.press/v7/doetsch09.html.