Lösungen zu Übung 1

Solution for Exercise 2 only so far.

Excercise 2

Find all occurrences!

t = ACACTCCCCCGACAACC
p = ACAACC occurs only at position 6 (startposition).



Why does the algorithm show such a poor performance in this example? How can (for this example) the number of comparisons be reduced? (Algorithm?)

The pattern contains a C next to last character which results in a shift-distance of 1 for the character C.
Hence everytime we encounter a C in the text we can shift the window only one to the right. In general the performance will be bad if characters that occur on the right end in the pattern occur frequently in the text.
In this case the number of comparisions can be reduced if we used more of the information we receive from them.
When Horspool trys to match ….TC against ….CC it could safely shift the window past the T because it does not occur in p. Proceeding like this would save 12 of 21 comparisions.



Prove that the safe shift computed by the Horspool is indeed safe (no occurence of the pattern in the text is missed).

We can distinguish only 2 cases:
  1. the last character in the window does not occur in the pattern
  2. otherwise
In the first case we can safely shift the window by the length of the pattern, which is exactly what Horspool does.
For characters occuring in the pattern Horspool takes the distance of the rightmost occurence to the end of the pattern. Thus we will shift the search window until the former last character matches again for the first (!) time at a certain position in the window. This implies that for shifts smaller than the one computed by Horspool the character would mean a missmatch. For that reason we cannot have missed an occurrence of the pattern.
I'm curious about a shorter proof, maybe a bit more mathematical?
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback