Abstract—Erythropoiesis is the specific lineage in which the
haematopoietic stem cells (HSC) differentiate into red blood
cells. During their development, HSC undergo global gene
expression changes to reflect the current developmental stage
needs. A good way to identify the set of genes that have similar
global expression patterns across the different developmental
stages is through clustering. Unsupervised clustering aims at
highlighting these co-regulated genes without prior knowledge
regards their full interactions. In this study, we apply k-means
clustering on a gene expression microarray data that measures
the expression levels of human genes at four erythropoiesis
stages. Eight clusters have been identified; one cluster, in
particular, of 450 genes (C4) is more active toward the
maturation stages and it is involved in cell division and DNA
replication processes, which are vital during development.
Another cluster of 234 genes (C7) is involved in autophagy (cells
consumption/destruction), which is known to be involved in
enucleation (expulsion of the nucleus from the cell).
Index Terms—Clustering, elbow method, erythropoiesis, k-means.
Heba Saadeh and Basima Elshqeirat are with the Department of
Computer Science, University of Jordan, Amman, Jordan (e-mail:
heba.saadeh@ju.edu.jo, b.shoqurat@ju.edu.jo).
Reem Q. Al Fayez is with the Department of Computer Information
Systems, University of Jordan, Amman, Jordan (e-mail:
r.alfayez@ju.edu.jo).
Cite: Heba Saadeh, Reem Q. Al Fayez, and Basima Elshqeirat, "Application of K-Means Clustering to Identify Similar Gene Expression Patterns during Erythroid Development," International Journal of Machine Learning and Computing vol. 10, no. 3, pp. 452-457, 2020.
Copyright © 2020 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).