What is OCR?
An OCR (optical character recognition or text recognition) converts an already digitized image into text characters. An image with words is translated into machine-readable characters (letters, numbers, etc.). Here you can find out how this works even better: Tips & Tricks for Better Recognition Rates
Attention: Unfortunately, there are always misunderstandings regarding the terminology in discussions between specialist departments or with our customers (see the difference between OCR, iOCR and AI).
OCR - Basis for Process Automation
OCR is a technology that enables the conversion of scanned paper documents, PDF files or digital photos into editable documents for computers and software (such as Microsoft Word or general ledger software). Even LineItems can be extracted, you can read about that in this blog entry: Record Line Items with OCR
So if you have a document in paper form - for example an invoice, a purchase order or a contract that someone sent you as a PDF attachment - then a scanner is not sufficient to work with the relevant information from these documents. The scanner only takes an image of the document and this consists of a collection of pixels. In order to further process the information from scanned documents, digital images or image PDFs, you need OCR software that recognizes characters in the digital images, combines them into words and numbers, and uses them to create whole sentences. The software uses this to create a character string, a text, from an image. The online encyclopedia Wikipedia also explains this very well: Optical character recognition - Wikipedia. But now the semantic meaning of the text and the numbers (e.g. which number is the gross total) is still missing, so that you can automate your processes without "human in the loop".
How does an OCR system work?
Let's see how an OCR software works. First, the OCR application analyzes the structure of the document. It divides a page into structural elements such as blocks of text, tables, and images. Then lines are formed, which are divided into words and finally into letters. Once each letter has been identified, the program compares it to a series of sample images and calculates the probability of a match (character is 89% an "A"). The OCR software then decides on the most likely character.
An OCR system can also be configured for multiple languages. The more languages to be covered, the more difficult the task for the OCR and the recognition quality can decrease.
In addition, OCR text recognition often offers dictionary support for different languages. This allows the OCR to be optimized for Invoice Receipt in a specific domain (e.g. accounting).
Image Quality is crucial for Automation with OCR
Converting an image into a document only takes a few seconds. In a first step, you get a text and its meta information such as text size, font and position without manual effort.
This information now makes an image searchable and editable. But for an automation you need the semantic meaning of the text. The OCR and the automated text recognition are therefore the cornerstones for the automation of your processes. The characters, words and numbers as well as their meta information form an important data source for algorithms and AI models based on them, which assign semantics to the letter salad.
Our BLU DELTA KI invoice recording uses the results of the OCR to automatically and without further manual effort to extract valuable information for the subsequent processes (e.g. accounts payable). The customer not only receives character strings, words and numbers, but also their meaning.
As mentioned earlier, OCR software determines the probability of how closely a character matches a given number. This probability varies with image quality. Blurry images, text with a colored background or simply poorly scanned documents can have a major impact on the quality. We see in our regular BLU DELTA benchmarks (quality measurement with AI) that the photo and scan quality is crucial for the subsequent processes.
An "8" quickly becomes a "6" or a "B". However, a "tilted" letter has no effect on the automation for us. Modern NLP (Natural Language Processing) approaches, such as those we use at BLU DELTA, reduce such single errors.
Up to 30% higher automation rate
Caused by poor scan and image quality, we see differences of up to 30% in the automation rates in document capture at our customers. In terms of input quality, a distinction is made between digital photos, scans and PDF text. These differences are also a reason why we at BLU DELTA offer invoice capturing automation rate prediction.
Digital Photo and OCR
Normally, pictures taken with mobile devices have the following problems:
- Uneven illumination
- Wrong perspective
- Additional areas outside the page boundaries
OCR software can correct these problems to a certain extent. Nevertheless, digital photos represent the greatest challenge for automation due to the points mentioned above. So-called cam scanners or similar mobile OCR scanners and/or image optimization can improve the quality accordingly in advance.
Scan and OCR
Professional scanners already offer a good basis for the automated processing and capture of documents. If possible, scan your documents in black and white (thus lossless compression is possible) and with 300dpi. Small fonts up to 9pt can still be recognized easily.
PDF text and OCR
PDF text gives the best results. The actual OCR process is usually omitted here. The PDF document already contains the characters in digital form and the subsequent process "only" has to recognize the semantics. Documents in pure PDF text format achieve overall recognition rates of more than 90% with the BLU DELTA AI. If possible, you should therefore ensure that you receive unstructured or semi-structured documents as PDF text from your document sources.
However, PDF text documents are often enriched with images that contain text information. In this case, the advantage is put into perspective.
OCR in Accounting
Especially in (accounts payable) accounting, the term OCR is often equated with the recording of information from invoices. Technically, however, this is a separate process. The BLU DELTA AI contains a component for text recognition (OCR) and based on this, AI models that record the semantic relationships.
You are welcome to test our BLU DELTA invoice capturing as API or SDK for free.
BLU DELTA is a product for the automated capture of financial documents. Partners, but also our customers’ finance departments, accounts payable clerks and tax consultants can use BLU DELTA to immediately relieve their employees of the time-consuming and mostly manual entry of documents by using BLU DELTA AI and Cloud.
Blumatix Intelligence GmbH keeps it as its goal to make the strenuous everyday work easier with artificial intelligence and to always draw added value for everyone from shared intelligence.
Author: Christian Weiler is a former General Manager of a global IT company based in Seattle/US. Since 2016, Christian Weiler has been increasingly active in various roles in the field of artificial intelligence and has strengthened the management team of Blumatix Intelligence GmbH since 2018.