TXT format

This IMiS/OCR Server preferences section provides tuning parameters of recognized text export in TXT format. To know about the default value of this or that property, see its description below.

Code page: this property specifies the code page to which the recognized text is exported. If this property is (Automatic) the code page is selected automatically using the Code page type property value (default: (Automatic))

Text encoding: specifies the encoding type of the output file in TXT format (default: Auto)
Options:

ASCII: ASCII encoding, one byte per symbol.
Unicode UTF8: UTF8 is a code page that uses a string of bytes to represent a 16-bit Unicode string where ASCII text remains unchanged as a single byte, other text is converted to a 2-byte sequence (including Latin, Greek, Cyrillic, Hebrew, and Arabic) or a 3-byte sequence (Chinese, Japanese, Korean, and others).
Unicode UTF16: native Unicode format where every symbol is represented by two-byte sequence
Auto: encoding is selected automatically.

Use page break character as page separator: this option specifies if page break symbols (0x12) should be inserted between pages in case multiple pages are exported into TXT format (default: Unchecked)

Use blank line as paragraph separator: option specifies if an empty line should be inserted between paragraphs and act as paragraph separator (default: Unchecked)

Keep line breaks: this option specifies if original lines in recognized text are retained during export in TXT format (default: Checked)

Append to the end of file: this option specifies if exported text is to be appended to the end of file if it already exists (default: Unchecked)

Append EOF: specifies if the EOF symbol is inserted at the end of file (default: Unchecked)

| Back | Main view | Parent doc