Original goal:
- Download the reads from a fairly well finished genome → Human chr.21
- and assemble it using two or more standard assemblers → e.g. Celera Assembler, Mira
- Compare the results using layout software (→ OSLay) and genome comparison programs
Assembling:
- Downloaded reads (fastq) and alignments (BAM) from 1000 Genomes project
- Converted BAM to SAM and selected readnames, which mapped on Chr. 21
- Created new fastqs containing only selected reads
- Installed assembler wgs and mira
- Created frg-files from fastq-files (necessary for wgs)
- Problems running wgs
Plan B:
- Simulated contigs from Chr. 21
- Cut sequence
- Change order and orientation of contigs
Calculating local alignments between contigs:
- Megablast
- Problems:
- Fasta identifiers are not allowed to be changed !!!
OSLay:
- Input of chr. 21 (~ 34 MB) is too large for OSLay
Plan B:
- Used segment of chr. 21 (~210 KB)
1st round:
- Assembly A: Sequence divided by 100
- Assembly B: Sequence divided by 19
- Run OSLay and adjusted parameters (→ assemblies are from the same sequence)
- Results: there are still 4 / 5 supercontigs → too many similar contig borders in simulated assemblies
2nd round:
- Created contigs with random length
- Assembly A: contigs of length 500 - 5000 bp
- Assembly B: contigs of length 1 - 200 KB
- Run OSLay and adjusted parameters
- Same parameters as in the 1st round → better results
- Results: 3 / 3 supercontigs (1 huge, 2 small)