Abstract—Human emotions play a very important role in
communication. Emotional speech recognition research brings
human–machine communication closer to human-to-human
communication. This paper presents the evaluation using
ANOVA and T-test for the Vietnamese emotional corpus and
using deep convolutional neural networks to recognize four
basic emotions of Vietnamese based on this corpus: neutrality,
sadness, anger, and happiness. Five sets of characteristic
parameters were used as inputs of the deep convolutional neural
network in which the mel spectral images were taken and
attention was paid to the fundamental frequency, F0, and its
variants. Experiments were conducted for these five sets of
parameters and for four cases, depending on dependent or
independent content and dependent or independent speakers.
On average, the maximum recognition accuracy achieved was
97.86% under speaker-dependent and content-dependent
conditions. The results of the experiments also show that F0 and
its variants contribute significantly to the increased accuracy of
Vietnamese emotional recognition.
Index Terms—Corpus, deep convolutional neural network,
emotion, T-test, ANOVA, recognition, fundamental frequency,
mel spectrum, Vietnamese.
Thuy Dao Thi Le is with Hanoi University of Science and Technology,
Vietnam (e-mail: thuydtl@soict.hust.edu.vn).
Cite: Thuy Dao Thi Le, Loan Trinh Van, and Quang Nguyen Hong, "Deep Convolutional Neural Networks for Emotion Recognition of Vietnamese," International Journal of Machine Learning and Computing vol. 10, no. 5, pp. 692-699, 2020.
Copyright © 2020 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).