Follow

Scanning Best Practices

Overview

There are several best practices recommended by DoxTek to ensure the highest quality possible. Below, we have listed several of the best practices that can be implemented within your solution. While not all of these are applicable in every situation, by investing time in the scan process, you will see a higher level of accuracy throughout your entire document processing solution.

When scanning, we recommend the following settings:

DPI: 300 dpi

Color Configuration: Black and white

However, simply scanning with the settings mentioned above can still leave you with an image that is hard to perform Optical Character Recognition (OCR), leading to less reliable data. This makes it more difficult to extract the data or perform search operations on scanned images.

Details

Consider implementing some of the following, whether through your scanner solution or manually as you import or scan your files:

Deskew

When an image is scanned it is placed in/on the scanner without being flush; this leads to a crooked image. By deskewing the image, the image is now in the proper orientation.

Example Document: Thin paper (like a receipt) that is difficult to align with scanner edges.

 

img1.PNG

Cropping

Especially with smaller files, the scanned image can have a black background that is irrelevant and should be removed to not distract from the content of the actual document.

Example Document: Small form that is not your typical 8.5 x 11

img2.PNG

 

Deshade

Some documents contain shading (especially with tables) to assist the human eye in distinguishing between different rows. However, this shading leads to decreased image quality once scanned and should be removed.

Example Document: Invoice with a numerical table

img3.PNG

Despeckle

An image may contain speckles, dots, or small images that can detract from the actual data of the document.

Example Document: A document that has accidental speckles

img4.PNG

Edge Enhancement

Oftentimes, the print on the document may not be sufficiently crisp and make it hard for a computer to differentiate between individual characters. As seen below, OCR could easily confuse an ‘n’ for an ‘m’ when there is no clear stop and start between characters.

Example Document: Older document or document that was printed with low amounts of ink

img5.PNG

Horizontal Line Removal

When scanning documents with tables on them, it is often helpful to remove unnecessary lines that are traditionally used for human readability. See the section for Vertical Line Removal below.

Example Document: Invoice

img6.PNG

Vertical Line Removal

When scanning documents with tables on them, it is often helpful to remove unnecessary lines that are traditionally used for human readability. See the section for Horizontal Line Removal above.

Example Document: Invoice

img7.PNG

Destreak

Random lines within a document may cause a certain streak to be confused for a different character. Obviously, a lowercase ‘l’ could be confused with a ‘T’ if a perpendicular streak is near the top of the ‘l’.

Example Document: Poorly scanned document or a document that had some debris (dirt, hair, etc.) on it at scan time.

img8.PNG

Watermarks & Logos

It is common for certain documents, such as university transcripts, to have logos or watermarks on the document. Ensure that these marks are removed so as not to interfere with the data of the document. Although a ‘grayed’ out logo or watermark may not interfere with a human’s ability to read the document, this can affect how the document’s data is extracted during compression time.

Example Document: The draft version of a legal document

img9.PNG

General Guidelines

Keep in mind that each of these settings have their advantages and disadvantages. Test the different settings and see the impact they have on the document being processed by comparing the anticipated results with the extracted results. If you have any further questions regarding best practices, please reach out to your first line of support for additional information.

Additional Information

DoxTek Support

Kofax Support

Was this article helpful?
1 out of 1 found this helpful
Have more questions? Submit a request

0 Comments

Article is closed for comments.
Powered by Zendesk