The analysis of metagenomic mixtures (e.g., figuring out what type of infections are hiding in a drop of blood or figuring out which creatures are living in a handful of lake water) is a fundamentally combinatorial challenge: each piece of observed evidence (evidence can come from high-throughput sequencing, mass spectrometry, or any other platform) can potentially support myriad hypotheses: Perhaps a small piece of protein in your blood could come from an infection, but maybe that small bit of protein was produced by your body? Or maybe the fragment of protein could also come from the sweater of the technician who analyzed the sample? Not only are the possible explanations manifold, we must also consider all possible combinations (i.e., perhaps most of that particular protein in your blood came from a bacterial pathogen, but some small amounts are also produced by the human body).

There are often so many possible joint results that trying each of them would require more steps than the number of particles in the observable universe. Analyzing these mixtures is an exciting and highly difficult mathematical challenge. Not only will a better understanding lead to hospitals that can diagnose our illnesses from a single drop of blood, the dynamic programming algorithms that can break down this combinatorial problem also promise to be exciting in their own right.

The Computational Metagenomics group plays with lots of new algorithms and ideas for solving this puzzle.