| Back | Main view | Parent doc
HTML format
HTML format section provides tuning parameters of recognized text export in HTML format.
Code page: this property specifies the code page to which the recognized text is exported. If this property is <(Automatic)> the code page is selected automatically using the Code page type property value (default: (Automatic))
Text encoding: specifies the encoding type of the output file in TXT format (default: Auto)
Options:
- ASCII: ASCII encoding, one byte per symbol.
- Unicode UTF8: UTF8 is a code page that uses a string of bytes to represent a 16-bit Unicode string where ASCII text remains unchanged as a single byte, other text is converted to a 2-byte sequence (including Latin, Greek, Cyrillic, Hebrew, and Arabic) or a 3-byte sequence (Chinese, Japanese, Korean, and others).
- Unicode UTF16: native Unicode format where every symbol is represented by two-byte sequence
- Auto: encoding is selected automatically.
HTML Format: stores the value of picture resolution (dpi) that is used for exporting pictures for HTML format (default: Full (requires IE 4.0 or higher))
Options:
- Full (requires IE 4.0 or higher): full format using HTML 4.0 standard. It supports any type of the document layout retention. It requires Internet Explorer 4.0 or later. The built-in style sheet (CSS) is used.
- Simple (compatible with all browsers): simple format using HTML 3.2 standard. Almost all browsers support this format (Netscape Navigator, Internet Explorer 3.0 and later). The document layout is retained approximately: first line indent and indents in tables are not retained.
Layout: use this parameter to tell OCR engine to what degree it should retain page layout (default: Retain full page layout)
Options:
- Remove all formatting: only paragraphs are retained in the recognized text with the use of the <p> tag
- Retain font and font size: paragraphs and fonts of the recognized text are retained in the output HTML file. The <p> tag is used
- Retain full page layout: full source page layout is retained using table
Use line as page break: if this property is checked and several pages are exported in HTML format, <HR> tag is inserted between pages which makes browser draw a horizontal rule (default: Checked)
Keep line breaks: this option specifies if original lines in recognized text are retained during export in HTML format (default: Checked)
Retain text color: specifies if original colors of text and background are retained during export of the recognized text in HTML format (default: Checked)
Use Unicode: specifies if Unicode is used for writing recognized text into HTML file (default: Unchecked)
Keep pictures: specifies if pictures are written in files in RTF format (default: Checked)
Picture format: specifies the image format to be used during export to HTML; images are saved to separate files (default: Automatic)
Options:
- Automatic: format is defined automatically.
- JPEG Color: color JPEG format.
- JPEG Gray: gray JPEG format.
- PNG Color: color PNG format.
- PNG Gray: gray PNG format.
- PNG Black And White: black and white PNG format.
Picture resolution: stores the value of picture resolution (dpi) that is used for exporting pictures for HTML format (default: 200)
JPEG quality: stores the value of the JPEG quality for color pictures saved in HTML format in percent (default: 50)
| Back | Main view | Parent doc