You are here: ABI » ThesesHome » MultiReadAssignment

Solving the Multi-read assignment problem

Description

This thesis should provide new ideas to solve the problem of multi-read assignment for NGS data. Based on the works of Kececioglu [1] and Tammi [2], who address a similar problem in the context of sequence assembly methods should be developed to use the microheterogenity in the multiple hit locations to group reads together and subsequently assign them to the correct genomic location (e.g. using partial overlap and mate pair information).

Literature

  • [1] Kececioglu, Yu: Separating repats in DNA sequence assembly.
  • [2] Tammi et al. : Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs, Bioinformatics 18, 2002, 379-388
  • [3] Stephan Aiche: Separation of repeats in shotgun assembly data, MSc thesis.

Work Plan


This master thesis has a duration of 6 month from 01.12.2010 until 31.05.2011.

Month 1 - Setup evaluation framework. This includes
  • Generation of simulated test cases
  • Mapping of read with Razers
  • Computing a consensus alignment
  • Compute the correct location of the multireads (first a dummy implementation i.e. random assignment, or assignment to position with fewest errors)
  • Compute and display the goodness of the solution.



Month 2 - Implement different variant for multi read assignment, strategies

Month 3 - Integrate into Seqan, use real world data and performance tests

Month 4 - Enhance implementations and beginn writing diploma thesis

Month 5 & 6 - Working and writing out the master thesis

Weekly Report

Structure thesis

  • Introduction
    • Shotgut sequencing
    • Repeats in sequence assembly
    • Various origins in read mapping applications
      • All origins are present in the reference [A]
      • Some origin(s) are present in the reference ("hidden repeat") [B]
      • No origins are present in the reference [C] [?]
  • The solution framework
    • Identifying locations of interest for separation
      • Based on unexpectedly high coverage [should work for A+B]
      • Based the target regions of multi-mapped reads [should work for A]
      • Method for [C]?
    • Identification of valid separating columns in those locations
      • Kececioglu model
      • Tammi model
    • Classification of reads based on those columns by Kececioglu ILP
    • [ Integration ]
  • Implementation [?]
  • Evaluation
    • [...]
  • Conclusion / Discussion
    • [...]

To review

Integration der lokalen Klassifikation

Separation anhand einer oder mehrerer DNPs führt zu lokaler Klassifikation: Die Read_vorkommen_ an der DNP Stelle werden gegeneinander klassifiziert.

Verfahren:

  1. Read-Vorkommen lokal Klassifizieren
  2. Prüfen, ob innerhalb der lokalen Klassen reads mit globalen Klassen vorkommen
    1. NEIN
      Die nächste freie globale Class-ID wird abgefragt und allen Read-Vorkommen zugewiesen
    2. JA
      1. Wenn nur eine globale Klasse in der lokalen Klasse vorkommt:
        • Alle Read-Vorkommen werden dieser globalen Klasse zugewiesen
      2. Wenn mehrere globale Klassen vorkommen:
        • Alle in Konflikt stehenden Klassen werden ignoriert.
        • Verbleiben mehrere globale Klassen werden sie gemerged
        • Die verbleibende globale Klasse wird allen Read-Vorkommen ohne Zuweisung zugewiesen
  3. Die globalen Klassen die den jeweiligen lokalen Klassen zugewiesen wurden werden paarweise als in Konflikt stehend markiert
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback