Home > Archive > 2015 > Volume 5 Number 5 (Oct. 2015) >
IJMLC 2015 Vol.5(5): 384-387 ISSN: 2010-3700
DOI: 10.7763/IJMLC.2015.V5.538

Structured Vectors for Chinese Word Representations

Changliang Li, Bo Xu, Xiuying Wang, Gaowei Wu, Guanhua Tian, and Wendong Ge

Abstract—The use of word representations has been a key reason for the success of many NLP tasks. A lot of work has focused on improving the learning of word representations, and most approaches treat word as atomic unit. However, in some languages, for example Chinese, some words cannot be recognized correctly. This leads to the corruption of word embeddings’ ability to capture semantic information. This paper addresses this shortcoming by proposing structured embeddings for word representations. Our method utilizes sub-word and atomic unit embeddings to represent word embeddings. We build structured vectors for Chinese word representations based on the method, and evaluateon SemEval-2012 Task 4: Measuring Chinese word similarity. The result shows that our method is remarkably effective in capturing semantic information and outperforms previous best performance by a large margin. Our method can be extended to the languages which do not have a trivial word segmentation process.

Index Terms—Word embeddings, word segmentation, semantic information.

The authors are with the Institute of Automation Chinese Academy of Sciences 95 Zhongguancun East Road, 100190, Beijing, China (e-mail: changliang.li@ia.ac.cn, xubo@ia.ac.cn, xiuying.wang@ia.ac.cn, gaowei.wu@ia.ac.cn, guanhua.tian@ia.ac.cn, wending.ge@ia.ac.cn).

[PDF]

Cite: Changliang Li, Bo Xu, Xiuying Wang, Gaowei Wu, Guanhua Tian, and Wendong Ge, "Structured Vectors for Chinese Word Representations," International Journal of Machine Learning and Computing vol.5, no. 5, pp. 384-387, 2015.

General Information

  • E-ISSN: 2972-368X
  • Abbreviated Title: Int. J. Mach. Learn.
  • Frequency: Quaterly
  • DOI: 10.18178/IJML
  • Editor-in-Chief: Dr. Lin Huang
  • Executive Editor:  Ms. Cherry L. Chen
  • Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals LibraryCNKI.
  • E-mail: ijml@ejournal.net


Article Metrics in Dimensions