Structure Weight Calculation
Each sentence is assigned a weight related to structural information of the
document. These are delivered by the Layout Structure
Extraction Module. Especially interesting are sentences
- at the begin of the text
- at the end of the text
- at the begin of a paragraph
- at the end of a paragraph
- that are emphasized in any way
- marked as headings
- containing heading words (stopwords excluded)
The weight values are parametrized (positional values also have a
decreasing factor), so that they can be adjusted to any text sort.
Example-output [xml]
Pipeline Overview