| Back | Main view | Parent doc
Preprocessing
Preprocessing section enables administrator to set different parameters specifying how an image will be prepared and preprocessed before analysis and recognition, including parameters used for tuning the page layout, recognized text synthesis and table analysis process.
Layout: use this parameter to tell OCR Engine how the input page should be analysed and how the OCR result should be formatted (default: Autodetect)
Options:
- Autodetect: automatically detect analysis layout
- Format with spaces: specifies if space formatting should be performed instead of rich formatting (indents, tabs etc.)
- Single column: if this property is selected the analysis procedure presumes that there is only one column of text on a page
- Plain text: if this property is selected analysis procedure turns off Single Column mode and spaces are used to format text
Deskew: specifies if the skew angle for an image is corrected during the preparation process (default: Checked)
Invert: check this option if you want OCR engine to invert colors of the prepared image (default: Unchecked)
Remove noise: this option tells OCR engine to remove small garbage from the prepared image (default: Checked)
Detect page orientation: page orientation is detected during layout analysis, and if it differs from normal, OCR engine automatically rotates the image (default: Checked)
Detect inverted image: detect whether the image is inverted (white text against black background). The text color is detected during layout analysis, and if it differs from normal, OCR engine automatically inverts the image (default: Checked)
Remove texture: remove the background noise from a temporary image used for recognition. While Remove noise option removes only small random dots on the image, the Remove texture tells OCR engine to remove regular background noise even if its size is relatively large (default: Unchecked)
Tables: when any of the following options is checked, the table layout will be analysed more readily
Options:
- No merged cells in table: tells OCR engine to recognize tables with no merged cells (default: Unchecked)
- One line of text per cell: tells OCR engine to recognize tables with one line of text per each cell (default: Unchecked)
- Black lines as separators: tells OCR engine to recognize tables with no hidden separators. Table’s rows and columns will be determined by using the explicit separators only. This means that if the table only has vertical separators, it will have a single row. In the same way, if it only has horizontal separators, it will have a single column (default: Unchecked)
| Back | Main view | Parent doc