OCR and DeepOCR in Comparison

2. August 2022

The AI ​​stew is simmering in the IT kitchen – in addition to computer vision, especially in the area of ​​OCR text recognition. Many established software companies are struggling to upgrade their outdated software to the latest AI standards, at least in marketing brochures. Classic software algorithms were optimized and exhausted until 2015 and in some cases even today with little progress in the field of text recognition. Since around 2015, AI has been opening up new possibilities that are currently being used in many places on the market. Especially under the term DeepOCR, there are new products on the market that promise a new quality and some of which are even available as open source.

IMPORTANT: This is about pure text recognition at character and possibly word level. From a financial point of view, the term is somewhat exhausted due to history and is often incorrectly referred to as document or invoice entry. In order to provide clarity here, we have written our own article on the subject of OCR, iOCR and AI.

As a provider of an iOCR (BLU DELTA AI), we have to keep an eye on the OCR market and would like to make our results available.

OCR text recognition

Aim of the OCR Text Recognition Test

For this reason, in May 2022 we compared a small but excellent selection of OCR text recognition. The goal was to get an indication of whether there was any qualitative movement in the deep learning market. A benchmark of digits and numbers was created. 89 numbers consisting of 570 characters in the benchmark were used as ground truth.

Note: Numbers were used because they cannot or only rarely be corrected in the subsequent process of an iOCR. If one or the other letter tilts during recognition, you can use "similarities" to draw conclusions about the correct word (e.g. also using NLP models), which is not possible with numbers.

The Measurement (Benchmark Setup)

About 66% of the figures in the benchmark were from documents with good image quality and about 33% with poor image quality. All of the data came from original invoices and cash register receipts, such as are frequently found on the market. It was not a randomly drawn sample but had a bias towards poor image quality compared to the usual invoices and receipts. The products have been tested off-the-shelf and no training has been done.

The OCR text recognition was measured using 2 indicators:

  • Exact Match: The number must match the ground truth exactly.
  • Levenshtein Distance: How similar are the recognized values; serves as a measure of the quality of each character

We would like to point out that there are many criteria that justify an OCR assessment. In our case, it was only the recognition rate for numbers or digits.

The Results:

OCR text recognition

Exact

Levenshtein

Google OCR

92%

95,70%

Paddle OCR V2.5

64%

92,26%

AbbyyFineReader15

71%

86,47%

Omnipage Ultimate V19.2

62%

82%

Tesseract 5 OCR

57%

73%

OCR.space

50%

74%

Onlineocr.net 

42%

69%

Our Conclusion on OCR Text Recognition

It's no secret that Google has risen to become the qualitative market leader in OCR text recognition in recent years thanks to the latest AI architectures. But for the use case of business documents and scanning, a real open source challenger from Asia seems to be emerging - PaddleOCR. It is noticeable that both (Google and Paddle) use DeepLearning (DeepOCR) and were apparently much better than the competition with poor image quality. Both Google and Paddle are likely to have achieved a lot here in a very short time through deep learning and data. PaddleOCR can also be trained with your own data and improved accordingly. It can be assumed that with training PaddleOCR can achieve a similar performance as Google.

Additional notes about the test:

  • MMOCR would also have been an open source candidate, but our format could not be read here.
  • PaddleOCR had a bug that we previously fixed before we could use the results (1 space after each comma).

OCR text recognition has lost some of its importance in the field of intelligent OCR due to NLP, but represents the basis for any AI for recognizing the semantics in the subsequent process. A downstream AI architecture uses the recognized characters to map them to words, sentences and meaning . Transparent feedback from the customer's interface back to the AI system leads to a lasting learning effect (see also BLU DELTA shared intelligence concept).

In the next few weeks we are planning a pure DeepOCR benchmark between the Open Source Deep OCR providers: MM OCR, Paddle OCR and Easy OCR. Simply subscribe to the newsletter and don't miss anything 😊!

BLU DELTA is a product for the automated capture of financial documents. Partners, but also our customers’ finance departments, accounts payable clerks and tax consultants can use BLU DELTA to immediately relieve their employees of the time-consuming and mostly manual entry of documents by using BLU DELTA AI and Cloud.

Blumatix Intelligence GmbH keeps it as its goal to make the strenuous everyday work easier with artificial intelligence and to always draw added value for everyone from shared intelligence.

Christian Weiler

Author: Christian Weiler is a former General Manager of a global IT company based in Seattle/US. Since 2016, Christian Weiler has been increasingly active in various roles in the field of artificial intelligence and has strengthened the management team of Blumatix Intelligence GmbH since 2018.
Contact: c.weiler@blumatix.com