OpenMS
Main / News
OpenMS
TOPP
Team
Contributors
Collaborators
Publications
Links
Download
Documentation
Change log
Contact / Bugs

OpenMS - An open-source framework for mass spectrometry

OpenMS is an open-source software C++ library for LC/MS data management and analyses. It offers an infrastructure for the development of mass spectrometry related software. OpenMS is free software available under the LGPL.

OpenMS will be compatible with the upcoming Proteomics Standard Initiative (PSI) formats for MS data. We also try hard to keep up to date with code documentation and making it easy to use OpenMS.

This page gives an introduction to the map concept behind OpenMS and illustrates the several parts of the project.


The map concept of the OpenMS project

Differential analysis of proteome expression levels has been developing rapidly over the last years. While 2D gel based techniques are still the standard in the field, HPLC/MS-based approaches have gained considerable interest due to their larger potential for full automation. Consequently, this approach is being developed in industrial settings but also in larger biomedical research settings.

The acceptance as well as the impact of these techniques is however bounded by the ability to efficiently handle and analyze the tremendous volume of data produced by these techniques — the inevitable flip-side of automation. Overcoming these problems, and thus enabling the full potential of the method, can only be achieved by new developments in algorithmics, data management, and software engineering. These developments have to take place in tight integration with the method development in HPLC/MS, which is currently a very rapidly evolving field.

Differential analysis methods can be subdivided into the separation of the proteins or peptides, relative quantification using mass spectrometry which requires a preceding matching of corresponding peptides and the identification of the peptide sequence using MS or MS/MS.

Map transformations

OpenMS map concept

The data that is produced by the combination of multi-dimensional LC and subsequent MS can be viewed as a set of multidimensional discrete points. In the easiest case (as a one dimensional HPLC with an MS measurement) such a data point is described by retention time, m/z and intensity (e.g. ion count). The collection of all these data points is called a raw LC-MS map.

Frequently the data is denoised, baseline corrected and smoothed by the instrument software. OpenMS also contains several signal processing algorithms for this task if you require a more fine-grained control over raw data preprocessing.

The analysis of this raw data is done through several data reduction steps. In our view these steps corresponds to a series of map transformations (see image on the right).

'Raw map' to 'Peak Map'

The first step in a typical LC-MS data analysis workflow is called peak picking or centroiding. The result is what we call a peak or stick map. A peak is an object that contains summary information about a small set of raw data points, usually a local maximum of the ion count (intensity) in the LC-MS map.

Each peak is described by retention time, mass-to-charge ratio and intensity. Summary statistics of the raw data points represented by each peak can be estimated as well.

'Peak Map' to 'Feature map'

After peak picking, you might want to detect and quantify the peptides in your map. To do so, we need to collect all sticks (or raw data points) that are produced by a single peptide charge variant and group them into a feature.

The two steps together correspond to a significant data reduction. Typically a cloud of about 1000 raw data points that correspond to the elution of a peptide to one representative multi-dimensional point.

Features have summary coordinates (centroid retention time, average monoisotopic peak, summed intensities, etc. ) and additionaly contain information about their containing sticks or raw data points (isotope pattern, quality of fit, etc.)


Parts of the OpenMS project

OpenMS covers a wide range of functionalities which are needed for software development when it comes to the analysis of high throughput protein separation and mass spectrometry related data.

 

Signal Processing

High throughput analysis of proteins using mass spectrometry requires an efficient signal processing which reduces the amount of data with minimal loss of information. Peaks have to be detected, and important features like their peak centroid position, the height and the area have to be determined.
For peak picking, we propose a Wavelet-based scheme for the processing of mass spectrometry data that is able to cope with the difficulties posed by applications in proteomics.
Often several preprocessing steps are applied to raw data before the actual peak picking. OpenMS also supports baseline reduction and noise filtering of raw data.

raw data with picked peaks
 

Feature Finding

After the first data reduction by peak picking, our approach is to reduce the amount of data even further.
This is done by determining all peaks belonging to the same 'feature' (a chemical entity, for instance a peptide charge variant) and adjusting a theoretical model to them. A quality value is assigned to each feature regarding its measurement in each dimension of characterization, e.g. its elution profile (retention time) and isotope distribution (mass-to-charge).
The features are then used for label-free and isotope-labeled quantitation.

raw data map and feature map
 

Visualization

OpenMS can visualize HPLC-MS data in several differend views. Single scans can be displayed in a 2-dimensional plot. The more complex maps, can be displayed in a 2D view (birds-eye view with color-coded intensities) and even in a 3D view. The representations of the data are highly configurable and can thus be reused in many applications.
The images on the right are screenshots of 'TOPPView', a MS data viewer that is part of TOPP.

SpecView
 

Map Mapping

Many proteomics experiments consist of several HPLC-MS runs. The runs have to be aligned in order to correct for problems in chromatography. Map mapping is the process whereby maps of peaks or features from different measurements are superimposed.
The first step is to find a transformation that moves features from one map close to corresponding features in the other one. A standard geometric hashing approach is sufficient in most cases.
In the second step we determine a combinatorial matching and extract groups of corresponding features for differential analysis.

feature matching
 

Identification

The identification of peptides and proteins is one of the main tasks in proteomics research. OpenMS can read and write the data formats of the most popular identification engines, e.g. Sequest, Mascot, OMSSA, X!Tandem. Data structures for that store the data for further analysis of identification results are available.
OpenMS can e.g. filter the identification results, compute consensus results of several identification engines. Additionally the identifications can be validated using retention time prediction.

Identification
 

Database Support

High throughput technologies in modern proteomics result in a lot of data which needs to be annotated, analysed and stored. The huge amount of data makes database support an essential feature of nearly every proteomics application.
OpenMS offers database support through the Qt SQL module, which allows the use of a variety of different SQL databases.
The database model of OpenMS conforms as far as possible to the HUPO PSI-OM model.

database