Page Reanotation of the genome of Carsonella Ruddii using non-collinear methods

Layout of project

Project progress

Identification of the species set

  • I will only used genomes from gammaproteobacteria. The set of species will be extended later

  • For validation purposes I will analyze the species used in Moya et al. paper from 2007:
    • U00096.2 (E.Coli)
    • Four different str. of Buchnera aphidicola:
      • BA000003
      • AE013218
      • AE016826
      • CP000263
    • In the original paper the authors do not consider any plasmids of the species.
  • Set of species I have so far (all sequences were downloaded on the 01.06.10):
    • Carsonella Ruddii PV (160 kb genome, 213 genes) (grammaproteobacteria) (AP009180.1)
    • Buchnera aphidicola BCc (Cc) (+ a plasmid) : 450 kb. (grammaproteobacteria) (CP000263.1)
    • Candidatus Blochmannia floridanus: 705 kb. (631 genes). (grammaproteobacteria) (BX248583.1)
    • Wigglesworthia glossinidia (+ a plasmid): 698 kb. (651 genes) (gammaproteobacteria) (BA000021.3)
    • Baumannia cicadellinicola str. Hc: 686 kb (651 genes) (grammaproteobacteria) (CP000238.1)

  • To identify the phylogenetic tree of these species I intend to use the 16S-rRNA sequence.

Results of 16S-rRNA comparison

  • I've extracted the 16S-rRNA sequence from all species. Then:
  • I will use the ML-tree to guide the building of a progressive alignment (e.g. in S-LAGAN)
REINERT: Yes. Use S-Lagan or SuperMap. Use a working program do not spend to much time making others work. What are the next intended steps? Think about one manageable outcome. You can certainly not re-evalutate the complete annotation.

Next steps (deutsch)

  • Parsen von der aktuell annotierten Version des Carsonella Genoms und des ABA Ergebnisses.
  • Statistische Auswertung des Ergebnisses mit R
    • Finde ich dieselben Gene. Finde ich andere/neue Gene. Ist die Anzahl der gefundenen Gene signifikant anderes.

Problems and Questions

  • Identification of a proper set of species appears to be not quite simple. C.Ruddii is classified as unclassified Gammaproteobacteria. I was told, that "normally" one can define a phylogenetic tree using the sequence of 16S rRNA, which has to be present in all bacteria in order for them to exist. (todo: check for a publication)
  • Can anyone name me a alignment program that uses ABA (A-Bruijn Alignment)? I discovered AliWABA, but the webservice they provide is not available (
BIRTE: The authors provide an implementation for download. The link is given on page 2 of the paper:

IVAN: Thx. I've downloaded and installed it successfully.

  • SuperMap:
    • What is the CHAOS format; CHAins Of Seeds ?
    • What format does the scoring file need?
    • Supermap needs some GPDB config file... Genome Profile DataBase. Their website is sort of down right now

  • ABA Problems:
    • I do understand the output now, but it is still strange.
    • Here (out1.pdf) and Here ( is the output of the example run of two small sequences (chloroplasts of two plants). In the nodes are the positions in each of the relevant sequence. The problem is that I only have two sequences here and in the output there are numbers ranging from 0 to 4...
    • I don't understand the color of some edges => why are some edges colored? Do they represent a strongly supported "path"?


