You are here: ABI » LectureWiki » PMSB_OpenMS_2011

Projektmanagement im Softwarebereich OpenMS 2011

Wiki Seite zum Praktikum Projektmanagement im Softwarebereich - http://www.openms.de/ OpenMS 2011.

Zeitplan

01.04.2011 (10.00 Uhr) Vorbesprechung
04.04. - 08.04.2011 C++-Kurs
11.04. - 12.04.2011 OpenMS CodingTutorial
12.04.2011 Projektvergabe
21.04.2011 Projektpräsentation
30.05.2011 Abgabe
31.05.2011 Abschlusspräsentation

Vorbesprechung

  • Projektbeschreibung
  • Templates für Präsentationen + Abschlussbericht(LaTex) siehe unten
  • Homework:
    • installieren von VisualStudio 2008/G++/QtCreator/XCode
    • Paper lesen %CITE{SturmEtal08,KohlbacherEtal07}% [ALLE] + indiv. Projektliteratur
    • VPN/WLAN

OpenMS Tutorial

Montag

  • Einführung in SVN, CMake, VS/QtCreator/XCode
  • Installieren von OpenMS source
  • Library Structure
    • Coding Convention
    • FAQ
    • Where is what
  • OpenMS External Project HowTo

Dienstag

  • Coding Tutorials
  • TOPPView
  • TOPPAS
  • Project Assignment

Homework

  • talk (21.04)
  • plan the project (milestones)

Projektpräsentation

Vortrag
  • Theorie
  • Ansatz
  • Interface/Interaction
  • Milestone plan

Projekte

Eine genaue Beschreibung der Projekte wird in kürze hinzugefügt

Fractional Averagines

Already implemented in NitPick %CITE{RenardEtal08}%. Reimplement and benchmark.

Deviations from Averagines

Averagines are the most common mean to assess if a signal orginiates from a peptide. However the model is not perfect. Especially for exotic peptides (sulfur heavy) the deviation is quite large. But how large? Given the averagine model, compute a theoretical digest of a protein database and determine the distribution of deviation from the model given knowledge about the sequence. Determine thresholds which are sensible in real life and find which peptides are affected. Does anything change when using a detectability filter?

Simulation of SILAC Labeling

MSSimulator is a simulator for mass spectrometry measurements integrated into OpenMS. One of its abilities is the simulation of labeled experiments. The SILAC%CITE{OngEtal02}% technique is currently supported but only in a very elementary way. The aim of the project is the extension of this labeling module to be more realistic (e.g., multi-channel SILAC).

Simulation of ICPL Labeling

MSSimulator is a simulator for mass spectrometry measurements integrated into OpenMS. One of its abilities is the simulation of labeled experiments. The ICPL%CITE{SchmidtEtal05}% is currently not included and therefor will be implemented in this project.

Robust Estimation of Peak FWHM

PeakPicking is one of the most important signal processing steps in analyzing MS data. One very competitive solution%CITE{LangeEtal07}% is available in OpenMS. To make this algorithm totally fool-proof the important FWHM (full width at half max) needs to be set properly. Usually this needs to be set by the user, using information from the experimental setup (instrument resolution, instrument type). However, it is also possible to estimate FWHM from the data, thus basically making PeakPicking fully automated in medium to low-res data. We will use simulated data for performance assessment and develop an algorithm that allows to estimate FWHM from data.

Referenzen

%STARTBIBTEX{abstracts = "off"}%

@article{Kohlbacher2007, abstract = {MOTIVATION: Experimental techniques in proteomics have seen rapid development over the last few years. Volume and complexity of the data have both been growing at a similar rate. Accordingly, data management and analysis are one of the major challenges in proteomics. Flexible algorithms are required to handle changing experimental setups and to assist in developing and validating new methods. In order to facilitate these studies, it would be desirable to have a flexible 'toolbox' of versatile and user-friendly applications allowing for rapid construction of computational workflows in proteomics. RESULTS: We describe a set of tools for proteomics data analysis-TOPP, The OpenMS Proteomics Pipeline. TOPP provides a set of computational tools which can be easily combined into analysis pipelines even by non-experts and can be used in proteomics workflows. These applications range from useful utilities (file format conversion, peak picking) over wrapper applications for known applications (e.g. Mascot) to completely new algorithmic techniques for data reduction and data analysis. We anticipate that TOPP will greatly facilitate rapid prototyping of proteomics data evaluation pipelines. As such, we describe the basic concepts and the current abilities of TOPP and illustrate these concepts in the context of two example applications: the identification of peptides from a raw dataset through database search and the complex analysis of a standard addition experiment for the absolute quantitation of biomarkers. The latter example demonstrates TOPP's ability to construct flexible analysis pipelines in support of complex experimental setups. AVAILABILITY: The TOPP components are available as open-source software under the lesser GNU public license (LGPL). Source code is available from the project website at www.OpenMS.de}, author = {Kohlbacher, Oliver and Reinert, Knut and Gr\"{o}pl, Clemens and Lange, Eva and Pfeifer, Nico and Schulz-Trieglaff, Ole and Sturm, Marc}, doi = {10.1093/bioinformatics/btl299}, issn = {1460-2059}, journal = {Bioinformatics (Oxford, England)}, keywords = {Algorithms,Computer Graphics,Database Management Systems,Databases,Information Storage and Retrieval,Information Storage and Retrieval: methods,Peptide Mapping,Peptide Mapping: methods,Programming Languages,Protein,Proteome,Proteome: chemistry,Proteomics,Proteomics: methods,Software,User-Computer Interface}, number = {2}, pages = {e191--7}, pmid = {17237091}, shorttitle = {Bioinformatics}, title = {{TOPP--the OpenMS proteomics pipeline.}}, url = {http://www.ncbi.nlm.nih.gov/pubmed/17237091}, volume = {23}, year = {2007} } @article{Lange2007, abstract = {Liquid chromatography coupled to mass spectrometry (LC-MS) and combined with tandem mass spectrometry (LC-MS/MS) have become a prominent tool for the analysis of complex proteomic samples. An important step in a typical workflow is the combination of results from multiple LC-MS experiments to improve confidence in the obtained measurements or to compare results from different samples. To do so, a suitable mapping or alignment between the data sets needs to be estimated. The alignment has to correct for variations in mass and elution time which are present in all mass spectrometry experiments.}, author = {Lange, Eva and Gr\"{o}pl, Clemens and Schulz-Trieglaff, Ole and Leinenbach, Andreas and Huber, Christian and Reinert, Knut}, doi = {10.1093/bioinformatics/btm209}, issn = {1367-4811}, journal = {Bioinformatics (Oxford, England)}, keywords = {Algorithms,Amino Acid Sequence,Chromatography,Liquid,Liquid: methods,Mass Spectrometry,Mass Spectrometry: methods,Peptide Mapping,Peptide Mapping: methods,Protein,Protein: methods,Proteome,Proteome: chemistry,Sequence Alignment,Sequence Alignment: methods,Sequence Analysis}, month = jul, number = {13}, pages = {i273--81}, pmid = {17646306}, title = {{A geometric approach for the alignment of liquid chromatography-mass spectrometry data.}}, url = {http://www.ncbi.nlm.nih.gov/pubmed/17646306}, volume = {23}, year = {2007} } @article{Ong2002, abstract = {Quantitative proteomics has traditionally been performed by two-dimensional gel electrophoresis, but recently, mass spectrometric methods based on stable isotope quantitation have shown great promise for the simultaneous and automated identification and quantitation of complex protein mixtures. Here we describe a method, termed SILAC, for stable isotope labeling by amino acids in cell culture, for the in vivo incorporation of specific amino acids into all mammalian proteins. Mammalian cell lines are grown in media lacking a standard essential amino acid but supplemented with a non-radioactive, isotopically labeled form of that amino acid, in this case deuterated leucine (Leu-d3). We find that growth of cells maintained in these media is no different from growth in normal media as evidenced by cell morphology, doubling time, and ability to differentiate. Complete incorporation of Leu-d3 occurred after five doublings in the cell lines and proteins studied. Protein populations from experimental and control samples are mixed directly after harvesting, and mass spectrometric identification is straightforward as every leucine-containing peptide incorporates either all normal leucine or all Leu-d3. We have applied this technique to the relative quantitation of changes in protein expression during the process of muscle cell differentiation. Proteins that were found to be up-regulated during this process include glyceraldehyde-3-phosphate dehydrogenase, fibronectin, and pyruvate kinase M2. SILAC is a simple, inexpensive, and accurate procedure that can be used as a quantitative proteomic approach in any cell culture system.}, author = {Ong, Shao-En and Blagoev, Blagoy and Kratchmarova, Irina and Kristensen, Dan Bach and Steen, Hanno and Pandey, Akhilesh and Mann, Matthias}, file = {::}, issn = {1535-9476}, journal = {Molecular \& cellular proteomics : MCP}, keywords = {3T3 Cells,Amino Acids,Amino Acids: metabolism,Animals,Cell Culture Techniques,Cell Culture Techniques: methods,Cell Differentiation,Cell Line,Deuterium,Deuterium: metabolism,Genetic Techniques,Hydrogen-Ion Concentration,Leucine,Leucine: metabolism,Mice,Muscles,Muscles: cytology,Peptides,Peptides: chemistry,Proteomics,Proteomics: methods,Time Factors,Up-Regulation}, month = may, number = {5}, pages = {376--86}, pmid = {12118079}, title = {{Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics.}}, url = {http://www.ncbi.nlm.nih.gov/pubmed/12118079}, volume = {1}, year = {2002} } @article{Renard2008a, abstract = {BACKGROUND: The reliable extraction of features from mass spectra is a fundamental step in the automated analysis of proteomic mass spectrometry (MS) experiments. RESULTS: This contribution proposes a sparse template regression approach to peak picking called NITPICK. NITPICK is a Non-greedy, Iterative Template-based peak PICKer that deconvolves complex overlapping isotope distributions in multicomponent mass spectra. NITPICK is based on fractional averaging, a novel extension to Senko's well-known averaging model, and on a modified version of sparse, non-negative least angle regression, for which a suitable, statistically motivated early stopping criterion has been derived. The strength of NITPICK is the deconvolution of overlapping mixture mass spectra. CONCLUSION: Extensive comparative evaluation has been carried out and results are provided for simulated and real-world data sets. NITPICK outperforms pepex, to date the only alternate, publicly available, non-greedy feature extraction routine. NITPICK is available as software package for the R programming language and can be downloaded from (http://hci.iwr.uni-heidelberg.de/mip/proteomics/).}, author = {Renard, Bernhard Y and Kirchner, Marc and Steen, Hanno and Steen, Judith a J and Hamprecht, Fred a}, doi = {10.1186/1471-2105-9-355}, file = {:C$\backslash$:/Users/adm\_bielow/AppData/Local/Mendeley Ltd./Mendeley Desktop/Downloaded/Renard et al. - 2008 - NITPICK peak identification for mass spectrometry data.(2).pdf:pdf}, issn = {1471-2105}, journal = {BMC bioinformatics}, keywords = {Algorithms,Mass Spectrometry,Mass Spectrometry: methods,Pattern Recognition, Automated,Proteomics,Sequence Analysis, Protein,Sequence Analysis, Protein: methods,Software}, month = jan, pages = {355}, pmid = {18755032}, title = {{NITPICK: peak identification for mass spectrometry data.}}, url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2655099\&tool=pmcentrez\&rendertype=abstract}, volume = {9}, year = {2008} } @article{Schmidt2005, abstract = {Stable isotope labelling in combination with mass spectrometry has emerged as a powerful tool to identify and relatively quantify thousands of proteins within complex protein mixtures. Here we describe a novel method, termed isotope-coded protein label (ICPL), which is capable of high-throughput quantitative proteome profiling on a global scale. Since ICPL is based on stable isotope tagging at the frequent free amino groups of isolated intact proteins, it is applicable to any protein sample, including extracts from tissues or body fluids, and compatible to all separation methods currently employed in proteome studies. The method showed highly accurate and reproducible quantification of proteins and yielded high sequence coverage, indispensable for the detection of post-translational modifications and protein isoforms. The efficiency (e.g. accuracy, dynamic range, sensitivity, speed) of the approach is demonstrated by comparative analysis of two differentially spiked proteomes.}, author = {Schmidt, Alexander and Kellermann, Josef and Lottspeich, Friedrich}, issn = {1615-9853}, journal = {Proteomics}, keywords = {Caseins,Caseins: chemistry,Electrophoresis, Gel, Two-Dimensional,Escherichia coli,Escherichia coli Proteins,Escherichia coli Proteins: chemistry,Escherichia coli: chemistry,Isotope Labeling,Isotope Labeling: methods,Myoglobin,Myoglobin: chemistry,Proteomics,Proteomics: methods,Spectrometry, Mass, Matrix-Assisted Laser Desorpti}, month = jan, number = {1}, pages = {4--15}, title = {{A novel strategy for quantitative proteomics using isotope-coded protein labels.}}, url = {http://www.ncbi.nlm.nih.gov/pubmed/15602776}, volume = {5}, year = {2005} } @article{Sturm2008, abstract = {BACKGROUND: Mass spectrometry is an essential analytical technique for high-throughput analysis in proteomics and metabolomics. The development of new separation techniques, precise mass analyzers and experimental protocols is a very active field of research. This leads to more complex experimental setups yielding ever increasing amounts of data. Consequently, analysis of the data is currently often the bottleneck for experimental studies. Although software tools for many data analysis tasks are available today, they are often hard to combine with each other or not flexible enough to allow for rapid prototyping of a new analysis workflow. RESULTS: We present OpenMS, a software framework for rapid application development in mass spectrometry. OpenMS has been designed to be portable, easy-to-use and robust while offering a rich functionality ranging from basic data structures to sophisticated algorithms for data analysis. This has already been demonstrated in several studies. CONCLUSION: OpenMS is available under the Lesser GNU Public License (LGPL) from the project website at http://www.openms.de.}, author = {Sturm, Marc and Bertsch, Andreas and Gr\"{o}pl, Clemens and Hildebrandt, Andreas and Hussong, Rene and Lange, Eva and Pfeifer, Nico and Schulz-Trieglaff, Ole and Zerck, Alexandra and Reinert, Knut and Kohlbacher, Oliver}, doi = {10.1186/1471-2105-9-163}, issn = {1471-2105}, journal = {BMC Bioinf.}, keywords = {Algorithms,Mass Spectrometry,Mass Spectrometry: methods,Programming Languages,Software}, pages = {163}, pmid = {18366760}, shorttitle = {BMC Bioinformatics}, title = {{OpenMS - an open-source software framework for mass spectrometry.}}, url = {http://www.ncbi.nlm.nih.gov/pubmed/18366760}, volume = {9}, year = {2008} }

%STOPBIBTEX%

Vorlagen für Abschlussbericht und Präsentationen

Formular für regelmäßiges progress meeting

Comments

 

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback