OpenMS
An open-source framework for mass spectrometry and TOPP – The OpenMS Proteomics Pipeline
TOPPASWorkflows
TOPPAS ships with OpenMS/TOPP and can be downloaded at Downloads -> OpenMS/TOPP
Installation instructions for all supported platforms can be found here.
In addition you can find a TOPPAS tutorial at the site.
On this page we list some TOPPAS workflows which we find useful and worth sharing. Most of them are tested and were successfully used in projects. To download one of the workflows to your computer, just click the link. You will also find these workflows if you open TOPPAS and click on File -> Online Repository.
Quantitation and Identification
iTRAQ quantitation and identification workflow.
Assumptions:
You have recorded iTRAQ data in HCD/CID mode on an Orbitrap hybrid instrument.
What it does:
Peptide quantitation using the ITRAQAnalyzer tool and FDR-controlled identification using three different search engines, which HCD and CID spectra searched separately (for better FDR).
How to run it:
-- install X!Tandem and OMSSA
-- make sure X!Tandem and OMSSA executables are in your PATH, or point the 4 adapter nodes to the executables
-- Enter input mzML files in Node #1, and a FASTA database name in Node #8. You also need the preprocessed .phr version of the FASTA file in the same directory (see OMSSAAdapter docu).
-- Enter database and server details for both MascotAdapterOnline nodes
-- check the parameter settings in every node, especially 'IDMapper'
SILAC quantitation of QToF data
This analysis pipeline is currently in use at the Universitaetsklinikum Hamburg-Eppendorf. It combines identification using two search engines (left) with peptide quantitation (right). With questions please contact
Marcel Kwiatkowski mkwiatkowski@hotmail.de
Lars Nilse l.nilse@dunelm.org.uk
QToF Premier (Micromass/Waters)
nLC-ESI-QTOF-MS/MS
input [1]: peak lists from Waters software in mzML format
input [2]: raw files in mzML format
output [22]: SILAC pepetide pairs with sequence annotation (final result) as csv
output [18]: final IDs in csv
output [12]: SILAC peptide pairs as consensusXML (for TOPPView)
Note that the MS1 data are being smoothed [8] before the SILAC peptide pairs are being detected in [9].
SILAC analysis pipeline for Thermo Q Exactive data with Mascot identification
input [1]: mzML
input [3]: fasta database
output [12]: protein abundances
SILAC peptide pairs are detected in [4]. Sequences are identified in [2, 5-9]. Note that the MS2 spectra are first sorted and peak picked before being submitted to the Mascot server.
Lars Nilse l.nilse@dunelm.org.uk
SILAC analysis pipeline for Thermo Orbi XL data with Mascot identification
input [1]: mzML
input [3]: fasta database
output [10]: protein abundances
SILAC peptide pairs are detected in [4]. Sequences are identified in [2, 5-7].
Lars Nilse l.nilse@dunelm.org.uk
SILAC analysis pipeline for Thermo Orbi XL data with identification from TPP
input [1]: mzML
input [3]: pepXML from TPP
input [4]: protXML from TPP
output [9]: protein abundances
SILAC peptide pairs are detected in [2]. Protein group information in [4] will be incorporated in the final result.
Lars Nilse l.nilse@dunelm.org.uk
SILAC analysis pipeline for Thermo Orbi XL data with identification from TPP reporting proteins present as light and heavy, light only as well as heavy only
input [1]: mzML in profile mode
input [2]: pep.xml from TPP ID pipeline
input [3]: prot.xml from TPP ID pipeline
output [25]: abundances for light only proteins
output [23]: abundances for heavy only proteins
output [9]: abundances for proteins present as light and heavy
Output of [10] and [4] contain singlets and doublets respectively. Output of [15] contains singlets that are not part of the doublets. Output of [22] and [18] conatain singlets with light and heavy annotations respectively.
Lars Nilse l.nilse@dunelm.org.uk
A labelfree ID and quantification pipeline.
Input node #1 should contain your input files (containing both MS1 and MS2 spectra)
Input node #4 should contain the database you want to search against. It should be a target-decoy-database. Otherwise, FDR filtering will not work. Moreover, you must specify the decoy string appended or prepended to the accessions of decoy sequences in your database in the parameters of PeptideIndexer.
The entire left branch of the workflow ending in IDMapper performs a ConsensusID using several search engines and subsequent FDR filtering.
In order for this to run, you must first download OMSSA and XTandem and set the path to the executables in the parameters of the respective nodes. If you have a Mascot server, set the server address and login credentials in the parameters of MascotAdapterOnline. If you do not have (or do not want to use) either of these three search engines, you can also just remove them from the workflow and use only one or two engines.
IDMapper maps the FDR-filtered IDs to the MS1 features found by FeatureFinderCentroided. The annotated featureXML files are then aligned using MapAlignerPoseClustering and finally linked using FeatureLinkerUnlabeledQT.
For subsequent analyses, the resulting ConsensusXML files can be exported to a text format using TextExporter or used for protein inference by means of the ProteinQuantifier tool, for example.
This pipeline has been successfully applied to time-course data of the human platelet proteome acquired on an Orbitrap Velos. Be sure to adapt the settings of the individual tools (RT and mass tolerance, considered charge states, ...) to fit your dataset.
Workflow for label-free quantification as published in Weisser et al., J. Proteome Res. (2013) (DOI: 10.1021/pr300992u). Updated for OpenMS 1.10.
Inputs (n samples):
1: raw data in profile mode (n mzML files)
2: corresponding peptide/protein identification data (n idXML files)
Outputs:
7: annotated feature maps (n featureXML files)
11: RT transformation descriptions (n trafoXML files)
12: annotated consensus map (1 consensusXML file)
14: table of peptide abundances (1 CSV file)
15: table of protein abundances (1 CSV file)
Parameters (compare Supplemental Table S2 in the publication):
FeatureFinderCentroided: set "algorithm:mass_trace:mz_tolerance" and "algorithm:isotopic_pattern:mz_tolerance" according to your instrument
IDFilter: adjust "score:pep" according to your ID results and desired FDR
MapAlignerIdentification: increase "algorithm:min_run_occur" for a more reliable alignment (e.g. to half the number of samples)
A simple identification workflow.
This workflow requires OMSSA to be installed on your machine. The path to the OMSSA executable ("omssacl") must be set in the parameters of the OMSSAAdapter node.
Node #1 accepts mzML files containing MS2 spectra.
Node #2 provides the database and is set to "recycling mode" to allow the database to be reused when there is more than one input file in node #1.
OMSSAAdapter calls OMSSA which performs the actual search.
PeptideIndexer annotates for each search result whether it is a target or a decoy hit.
FalseDiscoveryRate computes q-values for the IDs.
Finally, IDFilter selects only those IDs with a q-value of less than 0.01.
This workflow performs a simple quantification, assuming you have already performed an ID.
Node #1 expects mzML files. Node #2 finds the features in these maps and passes on featureXML files.
Corresponding peptide identifications in idXML format are expected in input node #3. They are mapped to the corresponding featureXML files by the IDMapper.
The Collect node waits for all processing rounds to finish, then runs FeatureLinkerUnlabeled once, with a list of all annotated featureXML files as input, which creates a single consensusXML output file.
Data Preprocessing
Basic peak picking pipeline using PeakPickerWavelet (use PeakPickerHiRes instead if you have high-resolution data as it is much faster).
Using NoiseFilterSGolay (assuming low-res data) and BaselineFilter (for MALDI data).
Depending on your data, either of the two filters may be removed or reconfigured with other parameters.
Experimental Pipelines
Note: The following pipelines are experimental, i.e., they most probably will only work on the recent development version of OpenMS. If you are interested in testing, please consider providing feedback, so we can further improve OpenMS and TOPP.
A labelfree ID and quantification pipeline.
Input node #1 should contain your input files (containing both MS1 and MS2 spectra)
Input node #2 should contain the database you want to search against. It should be a target-decoy-database. Otherwise, FDR filtering will not work. Moreover, you must specify the decoy string appended or prepended to the accessions of decoy sequences in your database in the parameters of PeptideIndexer.
In order for this to run, you must first download OMSSA and set the path to the executables in the parameters of the respective nodes.
IDMapper maps the FDR-filtered IDs to the MS1 features found by FeatureFinderCentroided. The annotated featureXML files are then aligned using MapAlignerPoseClustering and finally linked using FeatureLinkerUnlabeledQT.
Protein inference is performed by the ProteinQuantifier tool.
The final results are exported to mzTab.
(Note:
This pipeline has been successfully applied to data acquired on an Orbitrap Velos. Be sure to adapt the settings of the individual tools (RT and mass tolerance, considered charge states, ...) to fit your dataset.)