Abstract—Many real world data sets are comprised of
multiple representations or views, learning from multi-view
data is important in many applications. In the unsupervised
cross-language classification problems, the documents in
different languages always share the same set of categories. To
solve the cross-language clustering problem, we propose a novel
Stratified Sampling-based Cluster Ensemble method, which has
two main contributions. It can effectively generate several data
components from the cross-language documents set via
stratified sampling technique, so that the correlation between
multiple views can be significantly considered. On the other
hand, it makes use of the linked based consensus function to
combine the component clustering results, so that the
relationship between components can be effectively utilized.
A series of experiments on real cross-language documents set
have been conducted. The experimental results have shown that
the proposed method outperforms the state-of-the-art
multi-view clustering methods.
Index Terms—Unsupervised cross-language classification,
multi-view clustering, clustering ensemble, stratified sampling.
Wenli Gui, Liping Jing, and Jian Yu are with Beijing Key Lab of Traffic
Data Analysis and Mining, the School of Computer Science and Information
Technology, Beijing Jiaotong University, Beijing 100044 China (e-mail:
13125158@bjtu.edu.cn, lpjing@bjtu.edu.cn, jyu@bjtu.edu.cn).
Liu Yang is with Beijing Key Lab of Traffic Data Analysis and Mining,
the School of Computer Science and Information Technology, Beijing
Jiaotong University, Beijing 100044 China, and College of Mathematics and
Computer Science, Hebei University, Baoding, Heibei, China (e-mail:
11112091@bjtu.edu.cn).
Cite: Wenli Gui, Liping Jing, Liu Yang, and Jian Yu, "Unsupervised Cross-Language Classification with Stratified Sampling-Based Cluster Ensemble," International Journal of Machine Learning and Computing vol. 5, no. 3, pp. 165-171, 2015.