Title: | A detailed comparison of analysis processes for MCC-IMS data in disease classification-Automated methods can replace manual peak annotations |
Author(s): | Horsch S; Kopczynski D; Kuthe E; Baumbach JI; Rahmann S; Rahnenfuhrer J; |
Address: | "Department of Statistics, TU Dortmund University, Dortmund, Germany. Bioinformatics, Computer Science XI, TU Dortmund University, Dortmund, Germany. Genome Informatics, Institute of Human Genetics, University of Duisburg-Essen, University Hospital Essen, Essen, Germany. Faculty of Applied Chemistry, Reutlingen University, Reutlingen, Germany" |
DOI: | 10.1371/journal.pone.0184321 |
ISSN/ISBN: | 1932-6203 (Electronic) 1932-6203 (Linking) |
Abstract: | "MOTIVATION: Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column-ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. METHOD: We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. RESULTS: The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology" |
Keywords: | "Automation, Laboratory/*methods Breath Tests/instrumentation/methods *Data Curation Humans *Models, Theoretical *Spectrum Analysis/instrumentation/methods;" |
Notes: | "MedlineHorsch, Salome Kopczynski, Dominik Kuthe, Elias Baumbach, Jorg Ingo Rahmann, Sven Rahnenfuhrer, Jorg eng Comparative Study 2017/09/15 PLoS One. 2017 Sep 14; 12(9):e0184321. doi: 10.1371/journal.pone.0184321. eCollection 2017" |