Processing Metabolomics and Proteomics Data with Open Software: A Practical Guide
CHAPTER 2: Mass Spectrometry Data Operations and Workflows
-
Published:16 Mar 2020
-
Special Collection: 2020 ebook collection
Magnus Palmblad, 2020. "Mass Spectrometry Data Operations and Workflows", Processing Metabolomics and Proteomics Data with Open Software: A Practical Guide, Robert Winkler
Download citation file:
Mass spectrometry is an extremely versatile analytical technique. It follows, therefore, that the number of applicable data analysis operations and the number of ways in which they can be combined in workflows are enormous. However, many workflows in proteomics and metabolomics, especially those starting from raw data, share a number of core operational components. Thanks to community efforts within ELIXIR – the European distributed infrastructure for life-science information – we now have the semantic tools, ontologies, to describe scientific domains, bioinformatics analysis operations and mass spectrometry data types and formats. The wording used in this chapter follows the EDAM1 preferred terms. To give an idea of the number of (free and open source) software packages available, this chapter sometimes refers to the number of entries in the ELIXIR bio.tools registry (http://bio.tools).2 The bio.tools registry is a community-based effort, collecting information on software in the bioinformatics domain. Recently, several hundred pieces of software for the analysis of mass spectrometry data in proteomics and metabolomics were added from http://ms-utils.org and other resources. In total, bio.tools currently (August 2019) contains systematic, functional annotations of close to 13 000 software packages, making it by far the richest collection of bioinformatics software descriptions of any kind, and a uniquely powerful resource for building bioinformatics data analysis workflows, including the analysis of mass spectrometry data in the omics domains. This chapter will briefly discuss common operations in such workflows, and give a few examples of open software for performing them. The emphasis is on tools performing a single task and performing it well, rather than monolithic software, libraries or toolboxes covering a wide range of functions. However, this book contains chapters dedicated to the open source OpenMS,3,4 MZmine5 /MZmine 2,6 XCMS7,8 and Trans-Proteomic Pipeline,9,10 each covering most of the operations described below. Individual modules and operations can, in many cases, be performed independently from the command line, in Windows or under Linux. This makes them well suited for integration in automated workflows, containerization and deploying on the cloud for large-scale analyses. The table below summarizes the operations described here and gives a few examples of free and open software able to perform these operations. Operations specific to mass spectrometry imaging, such as co-registration and region-of-interest determination, are not included, but the operations that are apply to mass spectrometry imaging data as well (Table 2.1).