Page StructuralAndCopyNumberVariationV1

Method using read depth from genome resequencing without case/control pairs. Implemented in pipeline at MPI-MG for Stefan Haas 7.2010-11.2010

1. Count reads in fixed size windows across N samples.

2. Normalize read counts per sample by total number of reads mapped.

3. Compute median of normalized read counts of N samples at each window. This acts as a background for read depth.

4. For windows with sufficient background depth, compute log_2 ((sample + psuedocount) / background)

5. Fill a matrix X with log ratios, such that N columns represent samples, M rows represent windows of the genome.

6. Create a matrix Z from X by normalizing the columns using a robust estimate of standard deviation and estimate of the central peak.

7. Calculate local false discovery rate for values in Z using locfdr R package. Methods described in Efron, B. (2004) "Large-scale simultaneous hypothesis testing: the choice of a null hypothesis", Jour Amer Stat Assoc, 99, pp. 96–104

8. Iteratively improve estimates of null prior per window using methods in Efron, B. and Zhang, N. (2010) "False Discovery Rates and Copy Number Variation" http://stat.stanford.edu/~ckirby/brad/papers/2010_FDRsandCNV.pdf

9. For each sample, build regions along chromosomes comprising windows with local FDR < .05

10. Group potential CNVs using a tolerance on the start and end positions.

11. Write out .gff files per sample. Annotate CNVs with mean true discovery rate over windows, frequency over samples, mean normalized read count over windows.

Comments

 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback