The purpose of this module is to segment text from other non-text components in
the page, like haltones, graphics, math, table etc. First a dual RAST segmenter
is applied to segment the page into zones. Then each zone is classified into one
of the following classes:

text
math
table
logo
drawing
halftone
ruling
noise

Zone classification is done using a logisitic regression classifier. A file
(log-reg-training-file.txt) containing the coefficients for the logisitic
regression classifier obtained by training the classifier on the UW-III dataset
is included. Since UW-III dataset has images scanned at 300-dpi, the system
works the best on 300-dpi scanned documents. For more information about the
algorithm, please refer to:

D. Keysers, F. Shafait, T.M. Breuel. "Document Image Zone Classification - A
Simple High-Performance Approach", VISAPP 2007, pages 44-51.
 
