Abstract—With most of Vietnamese hearing impaired individuals, Vietnamese Sign Language (VSL) is the only choice for communication. Thus, there are more and more study about the automatic translation of VSL to make a bridge between hearing impaired people and normal ones. However, automatic VSL recognition in video brings many challenges due to the orientation of camera, hand position and movement, inter hand relation, etc. In this paper, we present some feature extraction approaches for VSL recognition including spatial and scene-based features. Instead of relying on a static image, we specifically capture motion information between frames in a video sequence. For the recognition task, beside the traditional method of sign language recognition such as SVM, we additionally propose to use deep learning technique for VSL recognition for finding the dependence of each frame in video sequences. We collected two VSL datasets of the relative family topic (VSL-WRF) like father, mother, uncle, aunt.... The first one includes 12 words in Vietnamese language which only have a little change between frames. While the second one contains 15 with gestures involving the relative position of the body parts and orientation of the motion. Moreover, the data augmentation technique is proposed to gain more information of hand movement and hand position. The experiments achieved the satisfactory results with accuracy of 88.5% (traditional SVM) and 95.83% (deep learning). It indicates that deep learning combining with data augmentation technique provides more information about the orientation or movement of hand, and it would be able to improve the performance of VSL recognition system.
Index Terms—Vietnamese sign language (VSL), VSL recognition, local descriptors, spatial feature, scene-based feature, Motion-based feature, deep learning.
Anh H. Vo and Van-Huy. Pham are with the Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam (e-mail: email@example.com, firstname.lastname@example.org).
Bao T. Nguyen is with the Faculty of Information Technology, University of Education and Technology, Ho Chi Minh City, Vietnam (e-mail: email@example.com).
Cite: Anh H. Vo, Van-Huy. Pham, and Bao T. Nguyen, "Deep Learning for Vietnamese Sign Language Recognition in Video Sequence," International Journal of Machine Learning and Computing vol. 9, no. 4, pp. 440-445, 2019.Copyright © 2019 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).