A common task in the field of document digitization for information retrieval is separating text and non-text elements. In this paper an innovative approach of recognizing patterns is presented. Statistical and structural features in arbitrary number are combined into a rating tree, which is an adapted decision tree. Such a tree is trained for character patterns to distinguish text elements from non-text elements. First experiments in a binarization application have shown promising results in significant reduction of false-positives without producing false-negatives.
Object Recognition Using Summed Features Classifier
Springer in the Lecture Notes in Artificial Intelligence series, Part I, LNCS 7267, presented to the 11th International Conference ICAISC 2012, Zakopane, Poland, April 29-May 3, 2012, Proceedings, Part I