A Bi-directional Hierarchical Clustering (BHC) for Peak Matching of Large Mass Spectrometry Data Sets

Home > Archive > 2021 > Volume 11 Number 6 (Nov. 2021) >

IJMLC 2021 Vol.11(6): 373-379 ISSN: 2010-3700
DOI: 10.18178/ijmlc.2021.11.6.1064

Nazanin Zounemat Kermani, Xian Yang, Yike Guo, James McKenzie, and Zoltan Takats

Abstract—The preprocessing of mass spectrometry (MS) data is a crucial step in every MS study, which not only makes data comparable and manageable but also makes the study more reproducible. However, an essential part of this process, which is often overlooked, is peak matching. Although existing clustering methods have been applied for peak matching, the use of these methods have been limited. For example, the use of hierarchical agglomerative clustering (HAC) for matching of mass/charge signals has been constrained to small-scale MS data sets due to the computational complexity of HAC. In this paper, we reintroduce a bi-directional hierarchical agglomerative clustering (BHC) as a scalable and accurate peak matching technique. As a result, the computational complexity of hierarchical agglomerative clustering for peak matching was optimized by BHC to O(RlogR). BHC was benchmarked against existing peak matching techniques. Finally, we propose a parallelization framework that significantly reduces the peak matching method’s computation time.

Index Terms—Mass spectrometry data preprocessing, peak matching, hierarchical agglomerative clustering, parallel computing.

Nazanin Zounemat Kermani, Xian Yang, Yike Guo are with the Department of Computing, Data Science Institute, Imperial College London.
James McKenzie and Zoltan Takats are with the Faculty of Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London (e-mail: n.kermani@imperial.ac.uk).

[PDF]

Cite: Nazanin Zounemat Kermani, Xian Yang, Yike Guo, James McKenzie, and Zoltan Takats, "A Bi-directional Hierarchical Clustering (BHC) for Peak Matching of Large Mass Spectrometry Data Sets," International Journal of Machine Learning and Computing vol. 11, no. 6, pp. 373-379, 2021.

Copyright © 2021 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

PREVIOUS PAPER

First page

NEXT PAPER

Forecasting Household Electricity Consumption Using Time Series Models

General Information

E-ISSN: 2972-368X
Abbreviated Title: Int. J. Mach. Learn.
Frequency: Quaterly
DOI: 10.18178/IJML
Editor-in-Chief: Dr. Lin Huang
Executive Editor: Ms. Cherry L. Chen
Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library, CNKI.
E-mail: ijml@ejournal.net

Home

About IJML

Editorial Board

Author Guideline

Editor Guideline

Reviewer Guideline

Special Issues

Archive

Home > Archive > 2021 > Volume 11 Number 6 (Nov. 2021) >

A Bi-directional Hierarchical Clustering (BHC) for Peak Matching of Large Mass Spectrometry Data Sets

General Information

Article Metrics in Dimensions