Laura Heinrich-Litan:

Exact L_infinity Nearest Neighbor Search in High Dimensions

Kurzbeschreibung

In this thesis we consider the nearest-neighbor problem, which is defined as follows: given a fixed set P of n data points in some metric space X, build a data structure such that for each given query point q a data point from P closest to q can be found efficiently. The underlying metric space is usually the d-dimensional real space R^d together with one of the L_p-metrics, 1<= p <=∞. In many applications, the dimension d of the search space is quite high and can reach several hundreds or even several thousands. Therefore, running times and storage requirements exponential in d are prohibitive in these cases. Because of their exponential dependence on the dimension, all known techniques for exact nearest-neighbor problem are in fact in high dimensions not competitive with the brute-force method, which just determines the distance of q to each point in P and selects the minimum.
This thesis presents algorithms for solving the high-dimensional exact nearest-neighbor problem with respect to the L_∞-distance. We analyze the average-case situation when the data points are chosen independently at random under uniform distribution. The algorithms considerably improve the brute-force method, they are simple and easy to implement.
In Chapter 2 we consider query algorithms that need no preprocessing and require storage only for the point set P. Their average running time is O( n+(nd / ln(n)) ).
In Chapter 3 we present two strategies which speed up the search by using preprocessing. The query algorithm introduced in Section 3.1.2 requires linear storage and has an expected running time of O(n ln(d / ln( n)+1)+n). The data structure developed in Section 3.2 is based on a preprocessed partition of the data set into sequences, which are monotone with respect to some of the dimensions. The query algorithm has an expected running time of O( √dn^1-1/√dln(n)) for dimensions d<(ln(n)/ln(ln(n)))².
Chapter 4 presents several generalizations, in particular to the important problem of finding the k nearest neighbors to a query point. We generalize the analysis of the considered algorithms to other "well-behaved" probability distributions. Furthermore, we develop extensions of the algorithms which work efficiently in the external-memory model of computation.
In Chapter 5 we present a method which provides tradeoffs between the space complexity of the data structure and the time complexity of the query algorithm.

Betreuer

Prof. Dr. Helmut Alt

Abschluss

PhD

Abgabedatum

04.11.2002

Homepage des Autors

Laura Heinrich-Litan

Downloads

http://www.diss.fu-berlin.de/diss/receive/FUDISS_thesis_000000000944

Fachbereich Mathematik und Informatik

Theoretische Informatik

Laura Heinrich-Litan:

Exact L_infinity Nearest Neighbor Search in High Dimensions

Kurzbeschreibung

Downloads