Springe direkt zu Inhalt

Katy Sauerteig:

A Survey On String Attractors

Kurzbeschreibung

Given a string T, a string attractor Γ is a set of positions such that every substring of T has an occurrence T[i. . . j] overlapping with a positions in Γ. Due to its usefulness as a measure of repetitiveness of T, the string attractor has gained widespread academic interest. In particular, the size of the smallest attractor γ(T) bounds most known dictionary compressors, which has lead to a large diversity of literature studying the string attractor’s properties, its variants, as well as its uses in building data-structures that allow for fast queries in compressed space.

On the other hand, an alternate measure known as δ that lower-bounds γ has recently seen a lot of interest as a possible alternative to γ in some contexts. In particular, while γ is NP-complete to compute, δ can be found in linear time, and furthermore, unlike γ, δ is monotone, and much recent research on compressed data-structures has favored denoting runtime and space bounds via δ.

The goal of this paper is to provide a general overview of the current literature on attractors. In doing so, we discuss the attractor and its many variants, therein demonstrating some of its most notable properties. To be specific, we demonstrate the bounding relationships between γ and several dictionary compressors, namely the Lempel Ziv parsing, the run-length Borrows-Wheeler transform and bidirectional macro schemes, as well as how to build a O(γ log(n /γ) )-size macro scheme given a minimum attractor. Furthermore we outline the reductions proving NP-hardness for k >=2 for the more general k-attractor variant, as well as a different powerful reduction to set-cover that serves as the basis for many fast algorithms concerning attractors. Lastly we briefly discuss the metric δ and its implications on the future relevance of string attractors.

Abschluss
Bachelor of Science (B.Sc.)
Abgabedatum
12.12.2025