You are here: Foswiki>ABI Web>LectureWiki>PMSB_Seqan_2013>PSMB_Seqan_2013_NGS_Quality_Control>PSMB_Seqan_2013_NGS_Quality_Control_Details_Features (25 Apr 2013, dkersting)Edit Attach

PSMB_Seqan_2013_NGS_Quality_Control_Details_Features

Our application will read fastq files and collect statistical data about its content. The input itself will not be modified. A summary document will be created to display the results.

The application implements 3 areas of functionality: input, data collection, results output formatting. Each functionality is classified as Must-Have or Nice-To-Have. Implementation and testing will be done by Antje A or Daniel D

Input
Data Collection
Summary Document Generation
Application Options
Bonus Bonus Bonus List
Milestones

Input

The application will be able to

read fastq files Must-Have A
read bam files Nice-To-Have D
read compressed fastq files Nice-To-Have D

Data Collection

The following data will be collected:

input filename and format Must-Have A
which scoring system was used Must-Have A
conversion of scoring systems Must-Have D
total number of sequences Must-Have A
overall quality score average of all bases in all sequences Must-Have A
overall GC percent of all bases in all sequences Must-Have A
per read and per position: Must-Have A
- basic quality distribution data: median, mean, quantiles (10,25,75,90)
- distribution of [A,C,G,T]
- GC percent content
- N Content
for all reads: Must-Have A
- mean qualities distribution A
- sequence length distribution A
overall sequence metrics
- duplicated sequences Nice-To-Have A
- k-mer distribution Must-Have D

Summary Document Generation

Primary application output will be a tab-separated text file. A
The tsv file can be read by an accompanying R-script. This R-script and an HTML document will show the graphics output generated by R from the data. D
Document generation itself will be performed by a secondary application. D

Application Options

force quality score system
k-mer length
quick analysis (randomly choose a subset of reads to analyze)

Bonus Bonus Bonus List

linking with Galaxy or KNIME
One Script wich starts all the other
Graphical User Interface

Milestones

1st week: testing and implementation of a functionally minimal version that works through all steps
2nd week: testing an implementation of all basic statistics (A) and k-mer content (D)
3rd week: testing and implementation of sequence duplication (A) and output renement (D)
4th week: buffer for surprises, testing and implementation of NICE-TO-HAVE features (A+D)

Topic revision: r3 - 25 Apr 2013, dkersting - This page was cached on 11 Mar 2025 - 06:32.

ABI

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback