Computing Graph-Theoretic Similarity Scores Efficiently
Similarity measures are useful for many applications, such as recommender systems and search engines. Graph-theoretic similarity measures try to determine similarities by analyzing the relations between objects. SimRank is a popular such similarity measure with an intuitively sensible notion of similarity that is strongly inspired by PageRank. However, SimRank suffers from the fact that the algorithm originally suggested for its computation has an efficiency of Θ(n4). It is not computationally feasible to apply such an algorithm to a large dataset, and hence this thesis explores various alternative algorithms. It is derived how memorizing subresults can improve the efficiency to O(n3). Furthermore, an alternative graph-theoretic similarity measure, called CoSimRank, is discussed. CoSimRank has the advantage that individual similarity scores can be computed in quadratic time. Finally, the performance of the various algorithms introduced is evaluated using data from MovieLens, a collection of large real-world datasets for recommender systems.