Springe direkt zu Inhalt

Peter Meier:

Boundary-Preserving Training Set Reduction for k-Nearest-Neighbor Classification

Kurzbeschreibung

The k-nearest-neighbor rule (k-NN) is a simple and widely used classification algorithm, but it is often used with large and high-dimensional training sets, which pose significant efficiency challenges. While many data structures have been developed to speed up k-NN queries, training set reduction can reduce time further, by making the considered training set smaller. Two main reduction paradigms exist: boundary-approximating methods, which allow approximate decision boundaries, and boundary-preserving methods, which guarantee exact boundaries. Recent work by Eppstein and Flores-Velazco introduced a boundary-preserving reduction algorithm for 1-NN based on the inversion method, which works for arbitrary dimension, though it does not extend to higher values of k. This thesis provides an overview of existing training set reduction techniques, presents the inversion method in detail, and proposes a new extension for 3-NN. This algorithm preserves the exact classification boundaries, but does not achieve optimal size reduction. We evaluate two implementations of the inversion method, comparing training set size, classification error, and runtime performance. The experiments show that the inversion method approach is considerably more complex than boundary-approximating alternatives but can in many cases offer slightly increased accuracy.

Abschluss
Bachelor of Science (B.Sc.)
Abgabedatum
17.10.2025