Original goal:

  1. Download the reads from a fairly well finished genome -> Human chr.21
  2. and assemble it using two or more standard assemblers -> e.g. Celera Assembler, Mira
  3. Compare the results using layout software (-> OSLay) and genome comparison programs


Assembling:

  • Downloaded reads (fastq) and alignments (BAM) from 1000 Genomes project
  • Converted BAM to SAM and selected readnames, which mapped on Chr. 21
  • Created new fastqs containing only selected reads

  • Installed assembler wgs and mira
  • Created frg-files from fastq-files (necessary for wgs)
  • Problems running wgs

Plan B:

  • Simulated contigs from Chr. 21
  • Cut sequence
  • Change order and orientation of contigs

Calculating local alignments between contigs:

  • Megablast
  • Problems:
    • Fasta identifiers are not allowed to be changed !!!

OSLay:

  • Input of chr. 21 (~ 34 MB) is too large for OSLay

Plan B:

  • Used segment of chr. 21 (~210 KB)

1st round:

  • Assembly A: Sequence divided by 100
  • Assembly B: Sequence divided by 19
  • Run OSLay and adjusted parameters (-> assemblies are from the same sequence)
  • Results: there are still 4 / 5 supercontigs -> too many similar contig borders in simulated assemblies

2nd round:

  • Created contigs with random length
  • Assembly A: contigs of length 500 - 5000 bp
  • Assembly B: contigs of length 1 - 200 KB
  • Run OSLay and adjusted parameters
  • Same parameters as in the 1st round -> better results
  • Results: 3 / 3 supercontigs (1 huge, 2 small)
Topic revision: r4 - 19 Jul 2010, SabrinaKrakau
 
  • Printable version of this topic (p) Printable version of this topic (p)