1. Delete OCR Page Breaks
add chapter

If the OCR recognizes the OCR breaks, these are usually included in the OCR text.

However, this does not apply if the OCR text is subsequently retrieved from the internally available IW-OcrDoc object:
Here only the word list is searched and all words belonging to a range are returned, separated by blanks.

So: It depends on how the concrete OCR text is retrieved in the application.

  • If it is retrieved by direct OCR, i.e. without the IW-OcrDoc (by a direct ABBYY run), then the breaks are present.

ADJUSTMENT from the next release:

If an OCR text from an area is assigned to an index using the area properties dialog box, line breaks are replaced by spaces.
A general removal of line breaks is not useful.

What else can be done:

If OCR results are used in scripts, you can make the appropriate replacements in the script:

Example:

If the script variable “str” contains a string that also contains breaks, these can be replaced with

  • str.replace(“\n”, ” “)

be removed.

I.e., instead of the expression str, just use str.replace(“\n”, ” “)