Darko Dimitrov

Geometric Applications of Principal Component Analysis

Betreuer: Klaus Kriegel
Abschluss: PhD
Abgabedatum: 08.12.2008
Homepage des Autors:

Kurzbeschreibung

Bounding boxes are used in many applications for simplification of point sets or complex shapes. For example, in computer graphics, bounding boxes are used to maintain hierarchical data structures for fast rendering of a scene or for collision detection. Additional applications include those in shape analysis and shape simplification, or in statistics, for storing and performing range-search queries on a large database of samples. 

A frequently used heuristic for computing a bounding box of a set of points is based on principal component analysis. The principal components of the point set define the axes of the bounding box. Once the axis directions are given, the dimension of the bounding box is easily found by the extreme values of the projection of the points on the corresponding axis. Computing a PCA bounding box of a discrete point set in $\mathbb{R}^d$ depends linearly on the number of points. The popularity of this heuristic, besides its speed, lies in its easy implementation and in the fact that usually PCA bounding boxes are tight-fitting.

In this thesis we investigate the quality of the PCA bounding boxes. We give bounds on the worst case ratio of the volume of the PCA bounding box and the volume of the minimum volume bounding box. We present examples of point sets in the plane, where the worst case ratio tends to infinity. In these examples some dense point clusters have a big influence on the directions of the principal components, and the resulting PCA bounding boxes have much larger volumes than the minimal ones. To avoid the influence of such non-uniform distributions of the point sets, we consider PCA bounding boxes for continuous sets, especially for the convex hulls of point sets, obtaining several variants of continuous PCA. For those variants, we give lower bounds in arbitrary dimension, and upper bounds in $\mathbb{R}^2$ and $\mathbb{R}^3$. To obtain the lower bounds, we exploit a relation between the perfect reflective symmetry and the principal components of point sets. Each of the upper bounds in $\mathbb{R}^2$ and $\mathbb{R}^3$ is obtained from two parameterized bounds. The first bound is general for all bounding boxes, while to obtain the second bound, we exploit some of the properties of PCA, combining them with ideas from discrete geometry and integral calculus.

The relation between the perfect reflective symmetry and the principal components of point sets, leads to a straightforward algorithm for computing the planes of symmetry of perfect and approximate reflective symmetric point sets. For the same purpose, we
present an algorithm based on geometric hashing.

Downloads