Abstract—The discriminative approaches for hand pose estimation from depth images usually require dense annotated data to train a supervised network. Additionally, generative methods depend on temporal information in generating candidate poses which can be trapped due to local minima during the optimization process. Different from these methods, we propose a hybrid two-stage deep predictive neural network approach that performs predictive coding of image sequences of hand poses in order to capture latent features underlying a given image. Firstly, we train a deep convolutional neural network (CNN) for direct regression of hand joints position. Secondly, we add an unsupervised error term as a part of the recurrent architecture connected with predictive coding portion. An error regression term (ERT) ensures minimal residual errors of the estimated values while the predictive coding portion allows training of the network without the supervision of image sequences, so no dense annotation of data is required. We conduct a complete experiment using two challenging public datasets, ICVL and NYU. Using the ICVL datasets, our approach improved accuracy over the current state of the art methods with an average error joint of 7.5mm. We also achieve 12.2mm average error joint on NYU dataset which is the smallest error to be achieved on all state-of-art approaches.
Index Terms—Deep learning, hand pose estimation, joint regression, predictive neural networks.
J. Banzi is with the Department of Electronic Engineering and Information Science, School of Information Science and Technology, University of Science and Technology of China, 230026, Hefei city, Anhui Province, P.R China (e-mail: firstname.lastname@example.org)
I. Bulugu is with the Department of Electronics and Tel. Engineering, University of Dar-es-salaam, Tanzania (e-mail: email@example.com).
Z. Ye is with the Department of Electronic Engineering, University of science and technology of China No. 96, 230026, Hefei city, Anhui Province, P.R China (e-mail: firstname.lastname@example.org).
Cite: Jamal Banzi, Isack Bulugu, and Zhongfu Ye, "Deep Predictive Neural Network: Unsupervised Learning for Hand Pose Estimation," International Journal of Machine Learning and Computing vol. 9, no. 4, pp. 432-439, 2019.Copyright © 2019 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).