Abstract—Data imbalance is one of the problems that we face
when applying machine learning to real-world problems,
especially in image classification. With all the improvements in
machine learning, especially deep learning, research in this area
is drawing more attention from academics and even industry. To
address this imbalanced data problem, we adopt a hybrid
(algorithm and data) approach that consists of data
manipulation and weighted loss function in this paper. We
propose Ripple-SMOTE as a novel oversampling method to
generate synthetic data for preprocessing. A deep neural
network and the weighted loss function is applied so it will not
treat all classes equally. We also use a pre-trained model and
fine tune it to improve the classification accuracy. In this paper,
we report the evaluation results using imbalanced data sets
based on MNIST, CUReT texture set, and Malware data set,
and show that our approach significantly improves the
performance in imbalanced data cases and outperforms the
conventional approaches, especially in handling minority
classes.
Index Terms—Deep neural network, imbalanced data,
oversampling.
The authors are with Graduate School of Computer and Information
Sciences, Hosei University, Tokyo, Japan (e-mail:
rheza.harliman.5q@stu.hosei.ac.jp).
Cite: Rheza Harliman and Kaoru Uchida, "Data- and Algorithm-Hybrid Approach for Imbalanced Data Problems in Deep Neural Network," International Journal of Machine Learning and Computing vol. 8, no. 3, pp. 208-213, 2018.