Next-Generation Sequencing (NGS) allows to sequence and output millions of short reads in a single run. In many NGS pipelines for each of these reads a matching alignment in a reference genome needs to be found. This process is called read mapping. Some of these reads can not be unambiguously aligned to one position. Therefore some read mappers use mapping quality scores to indicate the reliabil- ity of the alignments. These scores are assigned to individual alignments and do not directly indicate ambiguous regions in the genome. Also some read mappers, especially read mappers which do not output subsequent matches, do not calculate mapping quality scores directly.
In this thesis we present a novel approach to estimate mapping quality scores without using any information about subsequent alignments. Our approach calculates the mapping quality score of perfect sequences extracted from the genome. For every position in the genome the average of the mapping quality scores is saved, as is the score at the starting position of the perfect sequence. The highest score covered by an alignment is used to calculate the mapping quality. This score is used to annotate the result file of a read mapper.
We evaluate the results by comparing them with the results of a read mapper which calculates mapping quality scores and directly calculated mapping quality scores. Our results indicate that the number of errors and the base call qualities have only a very small influence on the mapping quality score of an alignment. Because we only find a small influence we can precalculate mapping quality scores for a given a genome.
Additionally for a given genome we use the different read mapping results to find simulated single nucleotide polymorphisms (SNP). In our experiment the read map- per results without mapping quality scores generate better results than those with annotated mapping quality scores.