OpenMS

An open-source framework for mass spectrometry and TOPP – The OpenMS Proteomics Pipeline

TOPPASWorkflows

TOPPAS ships with OpenMS/TOPP and can be downloaded at Downloads -> OpenMS/TOPP
Installation instructions for all supported platforms can be found here.
In addition you can find a TOPPAS tutorial at the site.

On this page we list some TOPPAS workflows which we find useful and worth sharing. Most of them are tested and were successfully used in projects. To download one of the workflows to your computer, just click the link. You will also find these workflows if you open TOPPAS and click on File -> Online Repository.

Quantitation and Identification

File: iTRAQ_Quantation_and_ID.toppas Description:

iTRAQ quantitation and identification workflow.

Assumptions:

You have recorded iTRAQ data in HCD/CID mode on an Orbitrap hybrid instrument.

What it does:

Peptide quantitation using the ITRAQAnalyzer tool and FDR-controlled identification using three different search engines, which HCD and CID spectra searched separately (for better FDR).

How to run it:

-- install X!Tandem and OMSSA

-- make sure X!Tandem and OMSSA executables are in your PATH, or point the 4 adapter nodes to the executables

-- Enter input mzML files in Node #1, and a FASTA database name in Node #8. You also need the preprocessed .phr version of the FASTA file in the same directory (see OMSSAAdapter docu).

-- Enter database and server details for both MascotAdapterOnline nodes

-- check the parameter settings in every node, especially 'IDMapper'

File: SILACquantitationQToF.toppas Description:

SILAC quantitation of QToF data


This analysis pipeline is currently in use at the Universitaetsklinikum Hamburg-Eppendorf. It combines identification using two search engines (left) with peptide quantitation (right). With questions please contact


Marcel Kwiatkowski mkwiatkowski@hotmail.de

Lars Nilse l.nilse@dunelm.org.uk


QToF Premier (Micromass/Waters)

nLC-ESI-QTOF-MS/MS


input [1]: peak lists from Waters software in mzML format

input [2]: raw files in mzML format


output [22]: SILAC pepetide pairs with sequence annotation (final result) as csv


output [18]: final IDs in csv

output [12]: SILAC peptide pairs as consensusXML (for TOPPView)


Note that the MS1 data are being smoothed [8] before the SILAC peptide pairs are being detected in [9].

File: SILACquantitationExQ_MascotIdentification.toppas Description:

SILAC analysis pipeline for Thermo Q Exactive data with Mascot identification


input [1]: mzML

input [3]: fasta database


output [12]: protein abundances


SILAC peptide pairs are detected in [4]. Sequences are identified in [2, 5-9]. Note that the MS2 spectra are first sorted and peak picked before being submitted to the Mascot server.


Lars Nilse l.nilse@dunelm.org.uk



File: SILACquantitationOrbiXL_MascotIdentification.toppas Description:

SILAC analysis pipeline for Thermo Orbi XL data with Mascot identification


input [1]: mzML

input [3]: fasta database


output [10]: protein abundances


SILAC peptide pairs are detected in [4]. Sequences are identified in [2, 5-7].


Lars Nilse l.nilse@dunelm.org.uk



File: SILACquantitationOrbiXL_TPPidentification.toppas Description:

SILAC analysis pipeline for Thermo Orbi XL data with identification from TPP


input [1]: mzML

input [3]: pepXML from TPP

input [4]: protXML from TPP


output [9]: protein abundances


SILAC peptide pairs are detected in [2]. Protein group information in [4] will be incorporated in the final result.


Lars Nilse l.nilse@dunelm.org.uk



File: SILACquantitationOrbiXL_TPPidentification_plusLightOnlyHeavyOnly.toppas Description:

SILAC analysis pipeline for Thermo Orbi XL data with identification from TPP reporting proteins present as light and heavy, light only as well as heavy only


input [1]: mzML in profile mode

input [2]: pep.xml from TPP ID pipeline

input [3]: prot.xml from TPP ID pipeline


output [25]: abundances for light only proteins

output [23]: abundances for heavy only proteins

output [9]: abundances for proteins present as light and heavy


Output of [10] and [4] contain singlets and doublets respectively. Output of [15] contains singlets that are not part of the doublets. Output of [22] and [18] conatain singlets with light and heavy annotations respectively.


Lars Nilse l.nilse@dunelm.org.uk

File: Labelfree_Quant_and_ID.toppas Description:

A labelfree ID and quantification pipeline.

Input node #1 should contain your input files (containing both MS1 and MS2 spectra)

Input node #4 should contain the database you want to search against. It should be a target-decoy-database. Otherwise, FDR filtering will not work. Moreover, you must specify the decoy string appended or prepended to the accessions of decoy sequences in your database in the parameters of PeptideIndexer.

The entire left branch of the workflow ending in IDMapper performs a ConsensusID using several search engines and subsequent FDR filtering.

In order for this to run, you must first download OMSSA and XTandem and set the path to the executables in the parameters of the respective nodes. If you have a Mascot server, set the server address and login credentials in the parameters of MascotAdapterOnline. If you do not have (or do not want to use) either of these three search engines, you can also just remove them from the workflow and use only one or two engines.

IDMapper maps the FDR-filtered IDs to the MS1 features found by FeatureFinderCentroided. The annotated featureXML files are then aligned using MapAlignerPoseClustering and finally linked using FeatureLinkerUnlabeledQT.

For subsequent analyses, the resulting ConsensusXML files can be exported to a text format using TextExporter or used for protein inference by means of the ProteinQuantifier tool, for example.

This pipeline has been successfully applied to time-course data of the human platelet proteome acquired on an Orbitrap Velos. Be sure to adapt the settings of the individual tools (RT and mass tolerance, considered charge states, ...) to fit your dataset.

File: label-free_JPR_1.10.toppas Description:

Workflow for label-free quantification as published in Weisser et al., J. Proteome Res. (2013) (DOI: 10.1021/pr300992u). Updated for OpenMS 1.10.

Inputs (n samples):

1: raw data in profile mode (n mzML files)

2: corresponding peptide/protein identification data (n idXML files)

Outputs:

7: annotated feature maps (n featureXML files)

11: RT transformation descriptions (n trafoXML files)

12: annotated consensus map (1 consensusXML file)

14: table of peptide abundances (1 CSV file)

15: table of protein abundances (1 CSV file)

Parameters (compare Supplemental Table S2 in the publication):

FeatureFinderCentroided: set "algorithm:mass_trace:mz_tolerance" and "algorithm:isotopic_pattern:mz_tolerance" according to your instrument

IDFilter: adjust "score:pep" according to your ID results and desired FDR

MapAlignerIdentification: increase "algorithm:min_run_occur" for a more reliable alignment (e.g. to half the number of samples)

File: Ecoli_Identification.toppas Description:

A simple identification workflow.


This workflow requires OMSSA to be installed on your machine. The path to the OMSSA executable ("omssacl") must be set in the parameters of the OMSSAAdapter node.


Node #1 accepts mzML files containing MS2 spectra.


Node #2 provides the database and is set to "recycling mode" to allow the database to be reused when there is more than one input file in node #1.


OMSSAAdapter calls OMSSA which performs the actual search.


PeptideIndexer annotates for each search result whether it is a target or a decoy hit.


FalseDiscoveryRate computes q-values for the IDs.


Finally, IDFilter selects only those IDs with a q-value of less than 0.01.

File: BSA_Quantitation.toppas Description:

This workflow performs a simple quantification, assuming you have already performed an ID.


Node #1 expects mzML files. Node #2 finds the features in these maps and passes on featureXML files.


Corresponding peptide identifications in idXML format are expected in input node #3. They are mapped to the corresponding featureXML files by the IDMapper.


The Collect node waits for all processing rounds to finish, then runs FeatureLinkerUnlabeled once, with a list of all annotated featureXML files as input, which creates a single consensusXML output file.

File: MetaboliteQuantitationPipeline.toppas Description:

Quantitation workflow for metabolomics datasets.


Input: centroid mzML files

Output: tabular csv file of consensus


Possible application: Prepare dataset for statistical analysis (e.g., biomarker discovery)


In this workflow, metabolite signals are extracted with the FeatureFinderMetabo tool from each individual mzML input. The resulting individual feature maps are then merged into a single consensus map which is finally exported to a tabular csv file. This tabular dataset can be easily imported into statistics frameworks such as R.



Caveats:


1) The FeatureFinderMetabo tool works on centroid MS data only. Please prepend the PeakPickerHiRes tool if you want to process profile mode data directly.


2) Here, we assume that retention time shifts between runs are negligible and thus employ feature linking only. If you expect high retention time variation, please prepend the MapAlignerPoseClustering tool before FeatureLinkerUnlabeledQT.

 

Data Preprocessing

File: peakpicker_tutorial.toppas Description:

Basic peak picking pipeline using PeakPickerWavelet (use PeakPickerHiRes instead if you have high-resolution data as it is much faster).


Using NoiseFilterSGolay (assuming low-res data) and BaselineFilter (for MALDI data).


Depending on your data, either of the two filters may be removed or reconfigured with other parameters.

 

Experimental Pipelines

Note: The following pipelines are experimental, i.e., they most probably will only work on the recent development version of OpenMS. If you are interested in testing, please consider providing feedback, so we can further improve OpenMS and TOPP.

File: Labelfree_Quant_and_ID_mzTab.toppas Description:

A labelfree ID and quantification pipeline.


Input node #1 should contain your input files (containing both MS1 and MS2 spectra)


Input node #2 should contain the database you want to search against. It should be a target-decoy-database. Otherwise, FDR filtering will not work. Moreover, you must specify the decoy string appended or prepended to the accessions of decoy sequences in your database in the parameters of PeptideIndexer.


In order for this to run, you must first download OMSSA and set the path to the executables in the parameters of the respective nodes.


IDMapper maps the FDR-filtered IDs to the MS1 features found by FeatureFinderCentroided. The annotated featureXML files are then aligned using MapAlignerPoseClustering and finally linked using FeatureLinkerUnlabeledQT.


Protein inference is performed by the ProteinQuantifier tool.

The final results are exported to mzTab.


(Note:

This pipeline has been successfully applied to data acquired on an Orbitrap Velos. Be sure to adapt the settings of the individual tools (RT and mass tolerance, considered charge states, ...) to fit your dataset.)