Home > Archive > 2015 > Volume 5 Number 3 (Jun. 2015) >
IJMLC 2015 Vol. 5(3): 165-171 ISSN: 2010-3700
DOI: 10.7763/IJMLC.2015.V5.502

Unsupervised Cross-Language Classification with Stratified Sampling-Based Cluster Ensemble

Wenli Gui, Liping Jing, Liu Yang, and Jian Yu

Abstract—Many real world data sets are comprised of multiple representations or views, learning from multi-view data is important in many applications. In the unsupervised cross-language classification problems, the documents in different languages always share the same set of categories. To solve the cross-language clustering problem, we propose a novel Stratified Sampling-based Cluster Ensemble method, which has two main contributions. It can effectively generate several data components from the cross-language documents set via stratified sampling technique, so that the correlation between multiple views can be significantly considered. On the other hand, it makes use of the linked based consensus function to combine the component clustering results, so that the relationship between components can be effectively utilized. A series of experiments on real cross-language documents set have been conducted. The experimental results have shown that the proposed method outperforms the state-of-the-art multi-view clustering methods.

Index Terms—Unsupervised cross-language classification, multi-view clustering, clustering ensemble, stratified sampling.

Wenli Gui, Liping Jing, and Jian Yu are with Beijing Key Lab of Traffic Data Analysis and Mining, the School of Computer Science and Information Technology, Beijing Jiaotong University, Beijing 100044 China (e-mail: 13125158@bjtu.edu.cn, lpjing@bjtu.edu.cn, jyu@bjtu.edu.cn).
Liu Yang is with Beijing Key Lab of Traffic Data Analysis and Mining, the School of Computer Science and Information Technology, Beijing Jiaotong University, Beijing 100044 China, and College of Mathematics and Computer Science, Hebei University, Baoding, Heibei, China (e-mail: 11112091@bjtu.edu.cn).

[PDF]

Cite: Wenli Gui, Liping Jing, Liu Yang, and Jian Yu, "Unsupervised Cross-Language Classification with Stratified Sampling-Based Cluster Ensemble," International Journal of Machine Learning and Computing vol. 5, no. 3, pp. 165-171, 2015.

General Information

  • E-ISSN: 2972-368X
  • Abbreviated Title: Int. J. Mach. Learn.
  • Frequency: Quaterly
  • DOI: 10.18178/IJML
  • Editor-in-Chief: Dr. Lin Huang
  • Executive Editor:  Ms. Cherry L. Chen
  • Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals LibraryCNKI.
  • E-mail: ijml@ejournal.net


Article Metrics in Dimensions