B.Sc. topic proposal in high-throughput proteomics


  • Programming: ****
  • Math: **
  • Biology: *


LC-MS data is extremely complex and whole-cell lysates are never fully sequenced using today's state-of-the-art technology. Comparison of LC-MS data can be significantly enhanced by map alignment where unidentified features are assigned to a peptide sequence using information acquired in another LC-MS run. Since LC is potentially unstable, an RT correction procedure is usually required before ID's can be successfully transferred based on an accurate mass and time approach. To reduce the number of user-defined parameters and achieve maximal robustness at the same time, the alignment should not require a single reference file.


A guide-tree based multiple alignment should be implemented within the OpenMS software framework. The algorithm must feature a robust metric to estimate an initial distance matrix (e.g. percentage of overlapping ID's, stddev of matching IDs, or a combination of them). A comparison against a current implementation requiring a reference using benchmark metrics such as 1) stdev of aligned pairs, maybe even use Cross-validation (subset of IDs for alignment, test on out-set) 2) number of transferable IDs 3) number/size distribution of consensus-features (larger but fewer clusters should be preferred), should be conducted to prove the quality of the implemented solution.

Implementations of high quality can be integrated into the official release of the OpenMS software.


Expertise in C++ or a closely related language (Java) and object-oriented programming is strongly advised. Basic knowledge of LC-MS is desirable, but can be acquired at a sufficient level during the first days.


Topic revision: r1 - 25 Apr 2017, ChrisBielow
  • Printable version of this topic (p) Printable version of this topic (p)