This is the progress report of Enrico Siragusa for the Fall 2010.
Accomplishments up to the Fall 2010
- Familiarized with SeqAn.
- Read most of the literature on seed design.
Goals for the Fall 2010
Literature
Reading
- Finish reading seed design literature including Noe and Burkhardt PhD Thesis.
- Read motif finding literature including Lim Master Thesis.
Writing
- Write an essay on the state of the art of seed design. A current version can be found here.
Research
Seed Properties Computation
- Prove a PTAS for the threshold T_F(m,k).
- Extend the T_F(m,k) greedy algorithm to compute undetected U_F(m,k) and single detected S_F(m,k,l) similarities.
Search Space Analysis
- Find bounding criteria for single seeds, like U' contained in U implies T_U'(m,k) gte T_U(m,k).
- Evaluate PatternHunterII greedy method for seed families.
- Propose alternative heuristics.
Model Design
- Conceive a framework for seed design optimizing a seed family w.r.t. specificity for a given sensitivity threshold.
Motif Finding
- Conceive a motif finder based on the seed design framework.
SeqAn
Development
- Prototype a seed design module.
Maintenance
- Familiarize with motif finding module.
- Fix motif finding module.