Abstract—This paper explores the impact of reweighting the minority class of an imbalanced fraud dataset on the performance of an XGBoost binary classifier. Classifier performance is measured here in terms of true positive rate, false positive rate, precision, accuracy, AUC-ROC and AUC-PR. Our results suggest that reweighting the minority class has significant impact on these four key performance metrics when the classification threshold is held fixed and the model bias is not corrected. However, this impact becomes insignificant when (1) classification threshold is held fixed and the bias is corrected, or (2) when the target number of predicted positives is held fixed. Since fraud detection often prescribes a target number of cases for special treatment, these findings suggest that reweighting a dataset offers performance advantage only under very specific conditions for XGBoost-based classifiers. These conclusions can also generalize to problems where certain resampling techniques are used instead of reweighting since the two approaches tend to converge for sufficiently large datasets.
Index Terms—XGBoost, binary classifiers, class imbalance, reweighting, resampling, bias-variance tradeoff.
The authors are with the MRG Machine Learning Center of Excellence at J.P.Morgan Chase & Co. in Manhattan, NY 10016, USA (e-mail: firstname.lastname@example.org, email@example.com, firstname.lastname@example.org).
Cite: Altan Allawala, Anand Ramteke, and Pavan Wadhwa, "Performance Impact of Minority Class Reweighting on XGBoost-based Anomaly Detection," International Journal of Machine Learning and Computing vol. 12, no. 4, pp. 143-148, 2022.Copyright © 2022 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).