A Google search for comparative values of human error rates with our values quickly revealed one thing: virtually every provider of software for automating receipt capturing and accounting systems mentions the human error source, but none mentions a concrete figure. This also reflects our experience from discussions with customers. Human error is recognised but not measured in invoice capturing.
The situation is different with machines. The tolerance for error is low and the motto is rather: It all starts with the KPI (Key Performance Indicator). However, one must not forget that the machine is either created by humans or, as with artificial intelligence, very often learns from humans. Our BLU DELTA Invoice Capturing Intelligence therefore does not function detached from the human error rate, but this is decisive for the quality of our AI product in several respects (see also Benefits of AI).
The human error rate as an anchor point for AI
But how many errors are permissible? In the development of AI systems, the human error rate gives us an optimal orientation point. If our algorithm has an error rate of 10%, but the human error rate is 2%, this is an important indication of the quality and for further optimisation of the AI (see also Andrew Ng‘s Machine Learning Yearning, chapter 33). Because our AI should of course be better than humans!
Artificial intelligence “learns” from human error
We train our AI with data collected by our data capturers (also known as labellers). The quality and representativeness of the data for the use case play a decisive role: to make the error rate of our AI better than that of humans, we need data that contain as few human errors as possible. At BLU DELTA, we have defined our own rules and developed systems to ensure this.
Our human error rate: 2% and 7%
Receipt capturing is about recognising the correct characteristics of an invoice. The data capturer or accountant transfers the characteristic they are looking for (e.g. the gross total amount) from a scan or image of the invoice and enters it into a data template. This data is then recorded in our labelling tool and processed for further measurements or training. The correctness of the data is checked in a multi-stage process, whereby difficult or unclear entries are additionally screened.
There are different levels of difficulty: The gross total is often a particularly highlighted amount with reference to a currency information with a usually unique position on the invoice. For this type of characteristic, we have measured an average error rate of 2% for our data capturers. Other features that do not always have to be present, whose format is more general and whose position on the invoice is virtually arbitrary, are significantly more difficult. Here we measured an error rate of 7%.
For the error rate measurement of the easy and clearly recognisable characteristics, we examined 1052 characteristics records from 9 data capturers as a basis, and for the measurement of the difficult to recognise characteristics we used 231 characteristics records from 2 data capturers as a basis for analysis.
A guide to the quality of BLU DELTA models
From this, a guideline for our BLU DELTA AI models is derived. The error rate for clearly recognisable characteristics should be at least less than 2% (or 7% for hard-to-recognise characteristics).