The map concept of the OpenMS project
Differential analysis of proteome expression levels has been developing
rapidly over the last years. While 2D gel based techniques are still the
standard in the field, HPLC/MS-based approaches have gained considerable
interest due to their larger potential for full automation. Consequently,
this approach is being developed in industrial settings
but also in larger biomedical research settings.
The acceptance as well as the impact of these techniques is however bounded by
the ability to efficiently handle and analyze the tremendous volume of data
produced by these techniques — the inevitable flip-side of automation.
Overcoming these problems, and thus enabling the full potential of the method,
can only be achieved by new developments in algorithmics, data management, and
software engineering. These developments have to take place in tight
integration with the method development in HPLC/MS, which is currently a
very rapidly evolving field.
Differential analysis methods can be subdivided into the
separation of the proteins or peptides,
relative quantification using mass spectrometry which
requires a preceding matching of corresponding
peptides and the identification of the peptide sequence using MS or MS/MS.
Map transformations
The data that is produced by the combination of multi-dimensional LC and
subsequent MS can be viewed as a set of multidimensional discrete
points. In the easiest case (as a one dimensional HPLC with an MS measurement)
such a data point is described by retention time, m/z and
intensity (e.g. ion count). The collection of
all these data points is called a raw LC-MS map.
Frequently the data is denoised, baseline corrected and smoothed by the instrument software.
OpenMS also contains several signal processing algorithms for this task
if you require a more fine-grained control over raw data preprocessing.
The analysis of this raw data is done through several data reduction steps.
In our view these steps corresponds to a series of map transformations (see image on the right).
'Raw map' to 'Peak Map'
The first step in a typical LC-MS data analysis workflow is called peak picking or centroiding.
The result is what we call a peak or stick map. A peak is an object that contains summary information
about a small set of raw data points, usually a local maximum of the ion count (intensity)
in the LC-MS map.
Each peak is described by retention time, mass-to-charge ratio and intensity.
Summary statistics of the raw data points represented by each peak can be
estimated as well.
'Peak Map' to 'Feature map'
After peak picking, you might want to detect and quantify the peptides in your
map. To do so, we need to collect all sticks (or raw data points)
that are produced by a single peptide charge variant and group them into a feature.
The two steps together correspond to a significant data reduction.
Typically a cloud of about 1000 raw data points
that correspond to the elution of a peptide to one representative
multi-dimensional point.
Features have summary coordinates (centroid retention time, average
monoisotopic peak, summed intensities, etc. ) and additionaly contain
information about their containing sticks or raw data points (isotope pattern,
quality of fit, etc.)