Jarosław Chilimoniuk, Krystyna Grzesiak, Jakub Kała, Dominik Nowakowski, Adam Krętowski, Rafał Kolenda, Michał Ciborowski, Michał Burdukiewicz (2023). imputomics: web server and R package for missing values imputation in metabolomics data, Bioinformatics, 10.1093/bioinformatics/btae098.
If you have any questions, suggestions or comments, contact Michal Burdukiewicz.
We want to thank the Clinical Research Centre (Medical University of Białystok) members for fruitful discussions. K.G. wants to acknowledge grant no. 2021/43/O/ST6/02805 (National Science Centre). M.C. acknowledges grant no. B.SUB.23.533 (Medical University of Białystok). The study was supported by the Ministry of Education and Science funds within the project 'Excellence Initiative - Research University'. We also acknowledge the Center for Artificial Intelligence at the Medical University of Białystok (funded by the Ministry of Health of the Republic of Poland).
The following references are the source references for missing value imputation algorithms included in the web server.
Armitage EG, Godzien J, Alonso-Herranz V, López-Gonzálvez Á, Barbas C (2015). “Missing Value Imputation Strategies for Metabolomics Data: General.” ELECTROPHORESIS, 36(24), 3050-3060. ISSN 01730835, doi:10.1002/elps.201500352 https://doi.org/10.1002/elps.201500352.
Borowski J, Fic P (2022). “NADIA: NA Data Imputation Algorithms.”
van Buuren S, Groothuis-Oudshoorn K (2011). “Mice: Multivariate Imputation by Chained Equations in R.” Journal of Statistical Software, 45, 1-67. ISSN 1548-7660, doi:10.18637/jss.v045.i03 https://doi.org/10.18637/jss.v045.i03.
Chen LS, Prentice RL, Wang P (2014). “A Penalized EM Algorithm Incorporating Missing Data Mechanism for Gaussian Parameter Estimation.” Biometrics, 70(2), 312-322. ISSN 0006-341X, doi:10.1111/biom.12149 https://doi.org/10.1111/biom.12149.
Davis TJ, Firzli TR, Higgins Keppler EA, Richardson M, Bean HD (2022). “Addressing Missing Data in GC × GC Metabolomics: Identifying Missingness Type and Evaluating the Impact of Imputation Methods on Experimental Replication.” Analytical Chemistry, 94(31), 10912-10920. ISSN 0003-2700, doi:10.1021/acs.analchem.1c04093 https://doi.org/10.1021/acs.analchem.1c04093.
Dekermanjian J, Shaddox E, N D, y, Ghosh D, Kechris K (2023). “MAI: Mechanism-Aware Imputation.” Bioconductor version: Release (3.16). doi:10.18129/B9.bioc.MAI https://doi.org/10.18129/B9.bioc.MAI.
Dekermanjian JP, Shaddox E, Nandy D, Ghosh D, Kechris K (2022). “Mechanism-Aware Imputation: A Two-Step Approach in Handling Missing Values in Metabolomics.” BMC Bioinformatics, 23(1), 179. ISSN 1471-2105, doi:10.1186/s12859-022-04659-1 https://doi.org/10.1186/s12859-022-04659-1.
Di Guida R, Engel J, Allwood JW, Weber RJM, Jones MR, Sommer U, Viant MR, Dunn WB (2016). “Non-Targeted UHPLC-MS Metabolomic Data Processing Methods: A Comparative Investigation of Normalisation, Missing Value Imputation, Transformation and Scaling.” Metabolomics, 12(5), 93. ISSN 1573-3882, 1573-3890, doi:10.1007/s11306-016-1030-9 https://doi.org/10.1007/s11306-016-1030-9.
Faquih T (2022). “Tofaquih/Imputation_ of_ untargeted_ metabolites: Official Release v1.4.” Zenodo. doi:10.5281/zenodo.6347808 https://doi.org/10.5281/zenodo.6347808.
Faquih T, van Smeden M, Luo J, le Cessie S, Kastenmüller G, Krumsiek J, Noordam R, van Heemst D, Rosendaal FR, van Hylckama Vlieg A, Willems van Dijk K, Mook-Kanamori DO (2020). “A Workflow for Missing Values Imputation of Untargeted Metabolomics Data.” Metabolites, 10(12), 486. ISSN 2218-1989, doi:10.3390/metabo10120486 https://doi.org/10.3390/metabo10120486.
Fouad KM, Ismail MM, Azar AT, Arafa MM (2021). “Advanced Methods for Missing Values Imputation Based on Similarity Learning.” PeerJ Computer Science, 7, e619. ISSN 2376-5992, doi:10.7717/peerj-cs.619 https://doi.org/10.7717/peerj-cs.619.
Hastie T, Tibshirani R, Narasimhan B, Chu G (2023). “Impute: Impute: Imputation for Microarray Data.” Bioconductor version: Release (3.16). doi:10.18129/B9.bioc.impute https://doi.org/10.18129/B9.bioc.impute.
Honaker J, King G, Blackwell M (2011). “Amelia II: A Program for Missing Data.” Journal of Statistical Software, 45, 1-47. ISSN 1548-7660, doi:10.18637/jss.v045.i07 https://doi.org/10.18637/jss.v045.i07.
Jäger S, Allhorn A, Bießmann F (2021). “A Benchmark for Data Imputation Methods.” Frontiers in Big Data, 4, 693674. ISSN 2624-909X, doi:10.3389/fdata.2021.693674 https://doi.org/10.3389/fdata.2021.693674.
Jin Z, Kang J, Yu T (2018). “Missing Value Imputation for LC-MS Metabolomics Data by Incorporating Metabolic Network and Adduct Ion Relations.” Bioinformatics, 34(9), 1555-1561. ISSN 1367-4803, 1367-4811, doi:10.1093/bioinformatics/btx816 https://doi.org/10.1093/bioinformatics/btx816.
Josse J, Husson F (2016). “missMDA: A Package for Handling Missing Values in Multivariate Data Analysis.” Journal of Statistical Software, 70, 1-31. ISSN 1548-7660, doi:10.18637/jss.v070.i01 https://doi.org/10.18637/jss.v070.i01.
Jr FEH, several functions and maintains latex functions) CD( (2023). “Hmisc: Harrell Miscellaneous.”
Kokla M, Virtanen J, Kolehmainen M, Paananen J, Hanhineva K (2019). “Random Forest-Based Imputation Outperforms Other Methods for Imputing LC-MS Metabolomics Data: A Comparative Study.” BMC Bioinformatics, 20(1), 492. ISSN 1471-2105, doi:10.1186/s12859-019-3110-0 https://doi.org/10.1186/s12859-019-3110-0.
Kowarik A, Templ M (2016). “Imputation with the R Package VIM.” Journal of Statistical Software, 74, 1-16. ISSN 1548-7660, doi:10.18637/jss.v074.i07 https://doi.org/10.18637/jss.v074.i07.
Kumar N, Hoque MA, Sugimoto M (2021). “Kernel Weighted Least Square Approach for Imputing Missing Values of Metabolomics Data.” Scientific Reports, 11(1), 11108. ISSN 2045-2322, doi:10.1038/s41598-021-90654-0 https://doi.org/10.1038/s41598-021-90654-0.
Kumar N, Hoque MA, Shahjaman \M, Islam SS, Mollah MNH (2018). “A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis.” Current Bioinformatics, 14(1), 43-52. ISSN 15748936, doi:10.2174/1574893612666171121154655 https://doi.org/10.2174/1574893612666171121154655.
Lazar C, Burger T, Wieczorek S (2022). “imputeLCMD: A Collection of Methods for Left-Censored Missing Data Imputation.”
Lee JY, Styczynski MP (2018). “NS-kNN: A Modified k-Nearest Neighbors Approach for Imputing Metabolomics Data.” Metabolomics, 14(12), 153. ISSN 1573-3882, 1573-3890, doi:10.1007/s11306-018-1451-8 https://doi.org/10.1007/s11306-018-1451-8.
Li Q, Fisher K, Meng W, Fang B, Welsh E, Haura EB, Koomen JM, Eschrich SA, Fridley BL, Chen YA (2020). “GMSimpute: A Generalized Two-Step Lasso Approach to Impute Missing Values in Label-Free Mass Spectrum Analysis.” Bioinformatics, 36(1), 257-263. ISSN 1367-4803, 1367-4811, doi:10.1093/bioinformatics/btz488 https://doi.org/10.1093/bioinformatics/btz488.
Liaw A, Wiener M (2002). “Classification and Regression by randomForest.” R News, 2(3), 18-22.
Ma W, Kim S, Chowdhury S, Li Z, Yang M, Yoo S, Petralia F, Jacobsen J, Li JJ, Ge X, Li K, Yu T, Calinawan AP, Edwards N, Payne SH, Boutros PC, Rodriguez H, Stolovitzky G, Zhu J, Kang J, Fenyo D, Saez-Rodriguez J, Wang P (2021). “DreamAI: Algorithm for the Imputation of Proteomics Data.” doi:10.1101/2020.07.21.214205 https://doi.org/10.1101/2020.07.21.214205.
Mazumder THaR (2021). “softImpute: Matrix Completion via Iterative Soft-Thresholded SVD.”
Miller HA, Emam R, Lynch CM, Bockhorst S, Frieboes HB (2021). “Discrepancies in Metabolomic Biomarker Identification from Patient-Derived Lung Cancer Revealed by Combined Variation in Data Pre-Treatment and Imputation Methods.” Metabolomics, 17(4), 37. ISSN 1573-3882, 1573-3890, doi:10.1007/s11306-021-01787-2 https://doi.org/10.1007/s11306-021-01787-2.
NishithPaul (2021). “NishithPaul/missingImputation.”
Razzaghi T, Roderick O, Safro I, Marko N (2016). “Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values.” PLOS ONE, 11(5), e0155119. ISSN 1932-6203, doi:10.1371/journal.pone.0155119 https://doi.org/10.1371/journal.pone.0155119.
Samad MD, Abrar S, Diawara N (2022). “Missing Value Estimation Using Clustering and Deep Learning within Multiple Imputation Framework.” Knowledge-Based Systems, 249, 108968. ISSN 09507051, doi:10.1016/j.knosys.2022.108968 https://doi.org/10.1016/j.knosys.2022.108968.
Shah J, Brock GN, Gaskins J (2019). “BayesMetab: Treatment of Missing Values in Metabolomic Studies Using a Bayesian Modeling Approach.” BMC Bioinformatics, 20(S24), 673. ISSN 1471-2105, doi:10.1186/s12859-019-3250-2 https://doi.org/10.1186/s12859-019-3250-2.
Shah JS, Rai SN, DeFilippis AP, Hill BG, Bhatnagar A, Brock GN (2017). “Distribution Based Nearest Neighbor Imputation for Truncated High Dimensional Data with Applications to Pre-Clinical and Clinical Metabolomics Studies.” BMC Bioinformatics, 18(1), 114. ISSN 1471-2105, doi:10.1186/s12859-017-1547-6 https://doi.org/10.1186/s12859-017-1547-6.
Shahjaman \M, Rahman MR, Islam T, Auwul MR, Moni MA, Mollah MNH (2021). “rMisbeta: A Robust Missing Value Imputation Approach in Transcriptomics and Metabolomics Data.” Computers in Biology and Medicine, 138, 104911. ISSN 00104825, doi:10.1016/j.compbiomed.2021.104911 https://doi.org/10.1016/j.compbiomed.2021.104911.
Stacklies W, Redestig H, Scholz M, Walther D, Selbig J (2007). “pcaMethods—a Bioconductor Package Providing PCA Methods for Incomplete Data.” Bioinformatics, 23(9), 1164-1167. ISSN 1367-4803, doi:10.1093/bioinformatics/btm069 https://doi.org/10.1093/bioinformatics/btm069.
Stekhoven DJ, Bühlmann P (2012). “MissForest—Non-Parametric Missing Value Imputation for Mixed-Type Data.” Bioinformatics, 28(1), 112-118. ISSN 1367-4803, doi:10.1093/bioinformatics/btr597 https://doi.org/10.1093/bioinformatics/btr597.
Su Y, Gelman A, Hill J, Yajima M (2011). “Multiple Imputation with Diagnostics (Mi) in R: Opening Windows into the Black Box.” Journal of Statistical Software, 45, 1-31. ISSN 1548-7660, doi:10.18637/jss.v045.i02 https://doi.org/10.18637/jss.v045.i02.
Taylor S, Ponzini M, Wilson M, Kim K (2022). “Comparison of Imputation and Imputation-Free Methods for Statistical Analysis of Mass Spectrometry Data with Missing Data.” Briefings in Bioinformatics, 23(1), bbab353. ISSN 1467-5463, 1477-4054, doi:10.1093/bib/bbab353 https://doi.org/10.1093/bib/bbab353.
Taylor SL, Ruhaak LR, Kelly K, Weiss RH, Kim K (2016). “Effects of Imputation on Correlation: Implications for Analysis of Mass Spectrometry Data from Multiple Biological Matrices.” Briefings in Bioinformatics, bbw010. ISSN 1467-5463, 1477-4054, doi:10.1093/bib/bbw010 https://doi.org/10.1093/bib/bbw010.
Varga TV, Westergaard D (2020). “missCompare: Intuitive Missing Data Imputation Framework.”
Wei R, Wang J, Jia E, Chen T, Ni Y, Jia W (2018). “GSimp: A Gibbs Sampler Based Left-Censored Missing Value Imputation Approach for Metabolomics Studies.” PLOS Computational Biology, 14(1), e1005973. ISSN 1553-7358, doi:10.1371/journal.pcbi.1005973 https://doi.org/10.1371/journal.pcbi.1005973.
Wei R, Wang J, Su M, Jia E, Chen S, Chen T, Ni Y (2018). “Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.” Scientific Reports, 8(1), 663. ISSN 2045-2322, doi:10.1038/s41598-017-19120-0 https://doi.org/10.1038/s41598-017-19120-0.
Wilson MD, Ponzini MD, Taylor SL, Kim K (2022). “Imputation of Missing Values for Multi-Biospecimen Metabolomics Studies: Bias and Effects on Statistical Validity.” Metabolites, 12(7), 671. ISSN 2218-1989, doi:10.3390/metabo12070671 https://doi.org/10.3390/metabo12070671.
Xu D, Hu PJ, Huang T, Fang X, Hsu C (2020). “A Deep Learning\textendash Based, Unsupervised Method to Impute Missing Values in Electronic Health Records for Improved Patient Management.” Journal of Biomedical Informatics, 111, 103576. ISSN 15320464, doi:10.1016/j.jbi.2020.103576 https://doi.org/10.1016/j.jbi.2020.103576.
Xu J, Wang Y, Xu X, Cheng K, Raftery D, Dong J (2021). “NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data.” Molecules, 26(19), 5787. ISSN 1420-3049, doi:10.3390/molecules26195787 https://doi.org/10.3390/molecules26195787.