Alignment of LC-MS Proteomics Datasets Using Internal Anchors
October 21, 2010
2:30 p.m.
Alan Dabney
Abstract
Liquid chromatography mass spectrometry is a modern tool for extracting quantitative proteomic information from complex biological samples. A single sample gives rise to hundreds of thousands of data points, each characterized by (among other things) scan number, mass-to-charge ratio (m/z), and peak intensity; scan number is a function of how long it takes a peptide to travel through the liquid chromatography column. All three of these quantities can vary systematically within and between samples, due to technical aspects of the experiment. “LC-MS alignment” refers to the “lining up” of these quantities across samples. This is generally a black-box endeavor, with no objective measures available for assessing alignment efficacy. We exploit additional information obtained by commonly-used hybrid LC-MS / MS-MS instruments to identify “anchor” LC-MS features – features that can be identified and hence aligned across samples with high confidence. We then use the anchors as the basis for a simple nearest neighbors alignment algorithm. Post-alignment similarity of the anchors allows for interpretable, objective assessment of alignment efficacy, based on which we demonstrate our algorithm’s superior performance relative to existing black-box alignment algorithms.