Abstract—In this paper, a new method is proposed to improve hand joint regression in 3D hand pose estimation. The existing methods regress all joints together given a depth map. This causes misallocations of some hand joints, misuse of hand depth information, and have difficulties in estimating 3D coordinates accurately. In this paper, joint regression is performed in stages, such that highly flexible joints e.g. fingertip joints are regressed first followed by less flexible joints to avoid getting some errors while estimating all joints together. In practice, fingertip joints constitute relatively higher estimation errors than all other joints. Thus, we perform fingertip joint localization (2D joint estimation) after obtaining rough pose estimates from the pose estimator to locate fingertip joint positions. We then use these 2D joint estimates to generate the depth coordinates of the pose estimator. To further ensure the accuracy of the absolute pose hypothesis, we integrate a robust implicit shape-based hand detector with the deep regression pose estimator into one pipeline through a shared convolutional layer. Finally, a shared convolutional layer converts the 2D joint location to 3D poses. Consequently, our system can accurately estimate hand pose based on the prior knowledge of a well detected human hand and the properly located joint positions. Experiments were carried out on three publicly available datasets, ICVL, NYU, and MSRA. The proposed hand pose estimation system attains an accuracy of 96.4% at the threshold level of 40mm on the ICVL dataset, 92% on MSRA, and 89% on the NYU dataset illustrating the effectiveness of the proposed system over many state-of-art approaches.
Index Terms—Deep learning, human-computer interaction, image analysis.
Stanley L Tito and Aloys N. Mvuma are with Mbeya University of Science and Technology, Mbeya, Tanzania (e-mail: firstname.lastname@example.org, email@example.com).
Jamal F. Banzi is with the Sokoine University of Agriculture, Morogoro, Tanzania (e-mail: firstname.lastname@example.org).
Cite: Stanley L Tito, Jamal F. Banzi, and Aloys N. Mvuma, "A Deep Regression Network with Key-joints Localization for Accurate Hand Pose Estimation," International Journal of Machine Learning and Computing vol. 12, no. 6, pp. 318-327, 2022.Copyright @ 2022 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).