Aim of the OCR text recognition test
For this reason, in May 2022 we compared a small but excellent selection of OCR text recognition. The goal was to get an indication of whether there was any qualitative movement in the deep learning market. A benchmark of digits and numbers was created. 89 numbers consisting of 570 characters in the benchmark were used as ground truth.
Note: Numbers were used because they cannot or only rarely be corrected in the subsequent process of an iOCR. If one or the other letter tilts during recognition, you can use "similarities" to draw conclusions about the correct word (e.g. also using NLP models), which is not possible with numbers.
The Measurement (Benchmark Setup)
About 66% of the figures in the benchmark were from documents with good image quality and about 33% with poor image quality. All of the data came from original invoices and cash register receipts, such as are frequently found on the market. It was not a randomly drawn sample but had a bias towards poor image quality compared to the usual invoices and receipts. The products have been tested off-the-shelf and no training has been done.
The OCR text recognition was measured using 2 indicators:
- Exact Match: The number must match the ground truth exactly.
- Levenshtein Distance: How similar are the recognized values; serves as a measure of the quality of each character
We would like to point out that there are many criteria that justify an OCR assessment. In our case, it was only the recognition rate for numbers or digits.
Our Conclusion on OCR text recognition
It's no secret that Google has risen to become the qualitative market leader in OCR text recognition in recent years thanks to the latest AI architectures. But for the use case of business documents and scanning, a real open source challenger from Asia seems to be emerging - PaddleOCR.
It is noticeable that both (Google and Paddle) use DeepLearning (DeepOCR) and were apparently much better than the competition with poor image quality. Both Google and Paddle are likely to have achieved a lot here in a very short time through deep learning and data. PaddleOCR can also be trained with your own data and improved accordingly. It can be assumed that with training PaddleOCR can achieve a similar performance as Google.
Additional notes about the test:
- MMOCR would also have been an open source candidate, but our format could not be read here.
- PaddleOCR had a bug that we previously fixed before we could use the results (1 space after each comma).
OCR text recognition has lost some of its importance in the field of intelligent OCR due to NLP, but represents the basis for any AI for recognizing the semantics in the subsequent process. A downstream AI architecture uses the recognized characters to map them to words, sentences and meaning . Transparent feedback from the customer's interface back to the AI system leads to a lasting learning effect (see also BLU DELTA shared intelligence concept).
In the next few weeks we are planning a pure DeepOCR benchmark between the Open Source Deep OCR providers: MM OCR, Paddle OCR and Easy OCR. Simply subscribe to the newsletter and don't miss anything 😊!
DeepOCR: PaddleOCR, MMOCR and EasyOCR compared
In our first part "OCR and DeepOCR text recognition in comparison" we compared traditional OCR technologies with DeepOCR. In this section we want to take a closer look at the performance of three well-known DeepOCR open source alternatives.
Basic Information of the Candidates
- Paddle OCR v0.6.1
- MM OCR v22.214.171.124
- Easy OCR v1.5.0
In order to get a representative sample, we randomly selected 400 real invoices from our pool of documents and extracted all available text labels with their bounding boxes. We "blurred" the invoice documents heavily so that, with the exception of the label bounding box, nothing was legible. Thus we received a ground truth with 2326 images, each with a plain text label.
Pic.: Example document with label document date
The DeepOCR Result:
After our first test with digits and numbers - where Paddle was outstanding with its accuracy - we would have expected Paddle to be the winner here as well. However, Paddle has shown weaknesses especially with special characters, periods and commas. It was also surprising that Paddle was relatively little faster with the GPU.
MMOCR, which only supports English in addition to Chinese, should actually be considered out of competition. Our benchmark set consisted primarily of European invoices and here with a focus on German. Additionally, MMOCR did not recognize uppercase letters out-of-the-box.
Easy OCR surprised us positively and is the clear winner for our use case as long as you can use a GPU for inference.
The big advantage is, of course, that you can train this OCR accordingly for improved text recognition and text capture. If you want to save yourself the training effort, then EasyOCR with GPU is recommended.