Processing Metabolomics and Proteomics Data with Open Software: A Practical Guide
CHAPTER 16: Python in Proteomics
-
Published:16 Mar 2020
-
Special Collection: 2020 ebook collection
Hannes L. Röst, 2020. "Python in Proteomics", Processing Metabolomics and Proteomics Data with Open Software: A Practical Guide, Robert Winkler
Download citation file:
Python is a versatile scripting language that is widely used in industry and academia. In bioinformatics, there are multiple packages supporting data analysis with Python that range from biological sequence analysis with Biopython1 to structural modeling and visualization with packages like PyMOL and PyRosetta,2 to numerical computation and advanced plotting with NumPy/SciPy.3 In the proteomics community, Python began to be widely used around 2012 when several mature Python packages were published including pymzML,4 Pyteomics5 and pyOpenMS.6 This has led to an ever-increasing interest in the Python programming language in the proteomics and mass spectrometry community. The number of publications referencing or using Python has risen eight fold since 2012 (compared with the same time period before 2012), with multiple open-source Python packages now supporting mass spectrometric data analysis and processing.4,5,7–14 Computing and data analysis in mass spectrometry is very diverse and in many cases must be tailored to a specific experiment. Often, multiple analysis steps have to be performed (identification, quantification, post-translational modification analysis, filtering, FDR analysis etc.) in an analysis pipeline, which requires high flexibility in the analysis. This is where Python truly shines, due to its flexibility, visualization capabilities and the ability to extend computation with a large number of powerful libraries. Python can be used to quickly prototype software, combine existing libraries into powerful analysis workflows while avoiding the trap of re-inventing the wheel for a new project.