OCR and DeepOCR in Comparison

In the first part “OCR and DeepOCR text recognition in comparison” we compare traditional OCR technologies with DeepOCR. In the second section, we go into detail about the performance of three well-known DeepOCR open source alternatives.

The AI stew is simmering in the IT kitchen – in addition to computer vision, especially in the area of OCR text recognition. Many established software companies are struggling to upgrade their outdated software to the latest AI standards, at least in marketing brochures.

Classic software algorithms were optimized and exhausted until 2015 and in some cases even today with little progress in the field of text recognition. Since around 2015, AI has been opening up new possibilities that are currently being used in many places on the market. Especially under the term DeepOCR, there are new products on the market that promise a new quality and some of which are even available as open source.

IMPORTANT: This is about pure text recognition at character and possibly word level. From a financial point of view, the term is somewhat exhausted due to history and is often incorrectly referred to as document or invoice entry. In order to provide clarity here, we have written our own article on the subject of OCR, iOCR und KI.

As a provider of an iOCR (BLU DELTA AI), we have to keep an eye on the OCR market and would like to make our results available.

Aim of the OCR text recognition test

For this reason, in May 2022 we compared a small but excellent selection of OCR text recognition. The goal was to get an indication of whether there was any qualitative movement in the deep learning market. A benchmark of digits and numbers was created. 89 numbers consisting of 570 characters in the benchmark were used as ground truth.

Note: Numbers were used because they cannot or only rarely be corrected in the subsequent process of an iOCR. If one or the other letter tilts during recognition, you can use "similarities" to draw conclusions about the correct word (e.g. also using NLP models), which is not possible with numbers.

The Measurement (Benchmark Setup)

About 66% of the figures in the benchmark were from documents with good image quality and about 33% with poor image quality. All of the data came from original invoices and cash register receipts, such as are frequently found on the market. It was not a randomly drawn sample but had a bias towards poor image quality compared to the usual invoices and receipts. The products have been tested off-the-shelf and no training has been done.

The OCR text recognition was measured using 2 indicators:

Exact Match: The number must match the ground truth exactly.
Levenshtein Distance: How similar are the recognized values; serves as a measure of the quality of each character

We would like to point out that there are many criteria that justify an OCR assessment. In our case, it was only the recognition rate for numbers or digits.

The results:

OCR text recognition	Exact	Levenshtein
Google OCR	92%	95,70%
Paddle OCR V2.5	64%	92,26%
AbbyyFineReader15	71%	86,47%
Omnipage Ultimate V19.2	62%	82%
Tesseract 5 OCR	57%	73%
OCR.space	50%	74%
Onlineocr.net	42%	69%

Free whitepapers and helpful information on AI, digitization and document capture.

Newsletter sign up

Our Conclusion on OCR text recognition

It's no secret that Google has risen to become the qualitative market leader in OCR text recognition in recent years thanks to the latest AI architectures. But for the use case of business documents and scanning, a real open source challenger from Asia seems to be emerging - PaddleOCR.

It is noticeable that both (Google and Paddle) use DeepLearning (DeepOCR) and were apparently much better than the competition with poor image quality. Both Google and Paddle are likely to have achieved a lot here in a very short time through deep learning and data. PaddleOCR can also be trained with your own data and improved accordingly. It can be assumed that with training PaddleOCR can achieve a similar performance as Google.

Additional notes about the test:

MMOCR would also have been an open source candidate, but our format could not be read here.
PaddleOCR had a bug that we previously fixed before we could use the results (1 space after each comma).

OCR text recognition has lost some of its importance in the field of intelligent OCR due to NLP, but represents the basis for any AI for recognizing the semantics in the subsequent process. A downstream AI architecture uses the recognized characters to map them to words, sentences and meaning . Transparent feedback from the customer's interface back to the AI system leads to a lasting learning effect (see also BLU DELTA shared intelligence concept).

In the next few weeks we are planning a pure DeepOCR benchmark between the Open Source Deep OCR providers: MM OCR, Paddle OCR and Easy OCR. Simply subscribe to the newsletter and don't miss anything 😊!

Update!

DeepOCR: PaddleOCR, MMOCR and EasyOCR compared

In our first part "OCR and DeepOCR text recognition in comparison" we compared traditional OCR technologies with DeepOCR. In this section we want to take a closer look at the performance of three well-known DeepOCR open source alternatives.

Basic Information of the Candidates

Paddle OCR v0.6.1
- License: Apache License 2.0
- Supported Languages (pre-trained): 80(+)
- URL: https://github.com/PaddlePaddle/PaddleOCR
MM OCR v2.5.0.3
- License: Apache License 2.0
- Supported Languages (pre-trained): Chinese, English
- URL: https://github.com/open-mmlab/mmocr
Easy OCR v1.5.0
- License: Apache License 2.0
- Supported Languages (pre-trained): 80(+)
- URL: https://github.com/JaidedAI/EasyOCR

The Benchmark

In order to get a representative sample, we randomly selected 400 real invoices from our pool of documents and extracted all available text labels with their bounding boxes. We "blurred" the invoice documents heavily so that, with the exception of the label bounding box, nothing was legible. Thus we received a ground truth with 2326 images, each with a plain text label.

Pic.: Example document with label document date

The DeepOCR Result:

After our first test with digits and numbers - where Paddle was outstanding with its accuracy - we would have expected Paddle to be the winner here as well. However, Paddle has shown weaknesses especially with special characters, periods and commas. It was also surprising that Paddle was relatively little faster with the GPU.

MMOCR, which only supports English in addition to Chinese, should actually be considered out of competition. Our benchmark set consisted primarily of European invoices and here with a focus on German. Additionally, MMOCR did not recognize uppercase letters out-of-the-box.

Easy OCR surprised us positively and is the clear winner for our use case as long as you can use a GPU for inference.

The big advantage is, of course, that you can train this OCR accordingly for improved text recognition and text capture. If you want to save yourself the training effort, then EasyOCR with GPU is recommended.

BLU DELTA is a product for the automated capture of financial documents. Partners, but also our customers’ finance departments, accounts payable clerks and tax consultants can use BLU DELTA to immediately relieve their employees of the time-consuming and mostly manual entry of documents by using BLU DELTA AI and Cloud.

BLU DELTA is an Artificial Intelligence by Blumatix Intelligence GmbH.

Author: Christian Weiler is a former General Manager of a global IT company based in Seattle/US. Since 2016, Christian Weiler has been increasingly active in various roles in the field of artificial intelligence and has strengthened the management team of Blumatix Intelligence GmbH since 2018.
Contact: c.weiler@blumatix.com/span>