Optical character recognition

Optical character recognition (OCR) is only available in IMiS/Scan++ and IMiS/View++ editions. This chapter describes how to configure OCR and ways of using it in IMiS client.

Configuring OCR

OCR can be configured on the OCR category of the Preferences dialog (see OCR category for more details). Press Configure button in the OCR settings group box. The following OCR setting dialog appears. Here you can select the output format of the OCR result. You can choose from the following types: Rich text, PDF, HTML, Text, DBF, MS Excel, XML or Clipboard. Each type can be further configured by pressing the Format Settings... button. In the Options section you can set formatting, recognition and document preferences.

On the Formatting tab you can set:

the layout preference of the recognized text
whether or no to keep images
error highlighting level of the recognized text

You have three layout choices. You can retain either page layout, just font or remove all formatting. Error highlighting level is used to set the way characters, for witch it is unsure whether or not they were correctly recognized, are marked.

On the Recognition tab you can set the following preferences:

recognition language used in OCR process
document type preference
text type preference
whether or not to launch output file after the recognition is done

Recognition language is crucial information for OCR engine since if it is not correctly selected certain characters which are not found among valid language character set are not recognized or can be substituted with wrong characters. Document type is used to determine how the input page should be analysed and how the OCR result should be formatted. Text type is used to set default input file text type/mode.

On the Document tab you can set:

image preprocessing preferences
tables preferences
barcode detection
selection of pages to be recognized

In the Image preprocessing section you can set OCR engine to detect inverted images (white text against black background), invalid page orientation or garbage (excess dots that are smaller than a certain size) during preprocessing and then automatically invert, rotate or remove garbage from the images respectively. Tables can be processed with one line of text per each cell or with no merged cells.

Using OCR

After you have finished setting up the OCR module, you can send a region or selection of page(s) to OCR module, by simply clicking the OCR command in the Edit menu or on the toolbar button. If you don't select anything, you will be asked if you want to send the whole document to OCR module. You can see OCR in progress in the following figure.

Note: IMiS/Scan++ and IMiS/View++ clients use IMiS/OCR client to perform OCR. See IMiS/OCR Client manual for more information on OCR usage.

| Back | Main view | Parent doc