Today – about three years later – we can look back on the successful development of our product. Our BLU DELTA service is now used by SME clients, partners and also major clients and saves many finance departments and tax advisors costs and effort.
The road from prototype to product is a rocky one and is often completely underestimated – we were no exception. Three aspects played a central role in our successful development: data quality, benchmarks and development speed.
Today, everyone knows that data is needed to develop, train and measure the quality of AI models. Companies that have a lot of data relevant to their business (at BLU DELTA we always talk about ‘data treasure’) have a bright future ahead of them – as long as they know how to use this data today.
Of course, AI is only as good as its data. Amazon provided a negative example here (see also the article at mobilegeeks.de).
Data has to be realistic, verified and available in the appropriate quantity for the application. Active data management with appropriate tools in the background is needed to provide large amounts of data with high quality. Our data management framework allows us to specifically control so-called data labellers with tasks. The data annotated in this manner is checked or adapted on the basis of rules by the AI and, in the last instance, also by an expert. Only at the end of such a process is data explicitly released. Only such data may be used for development and training. In this way, the quality of the data and thus also that of the artificial intelligence can be kept high.
Figure 1: High Level BLU DELTA QA KPIs
Benchmarks are used to measure the quality of a model. For this purpose, a representative quantity and quality (!) of reference data is compared with the result of an AI model. Similar to a continuous integration process in software development, our data management framework allows for automated measurement in case of model changes or new trainings. New benchmarks are continuously added and visualise our progress in an automated manner.
Figure 2: Accuracy Development BLU DELTA
We measure both individual model results (e.g. for invoices, a model for the total gross amount) or the specific added value for customers (e.g. total recognition rate of several invoice details for Spanish invoices). The measurements focus on accuracy and false positives. In the financial world, a missing value is usually better than a false one.
Speed of development for complex topics
AI without agility is unthinkable! Apart from wrong applications, AI is more or less exclusively used for solutions in complex problem areas. That means you know what you want – but not how to get to that solution. The classic approach is therefore agility – build – measure – learn (see also the Build-Measure-Learn article by Mindtools). The decisive factor here is to complete this cycle with high quality in as short a time as possible, i.e. to reflect and measure as soon as possible whether the last implementation step was taken in the right direction (i.e. towards solving the problem) – or whether corrections are necessary for this path.
In order to keep the development speed high and targeted, it is also necessary to apply a single assessment criterion for measuring new models (see also „Single Metric Best Practice“, Machine Learning Yearning by Andrew Ng).
For all three topics, a high degree of automation is necessary in addition to appropriate tool support, which is constantly being further developed and optimised. This is the only way to ensure that, for example, all benchmarks are carried out daily or that new labels can be continuously checked for possible approvals.
These three elements were the key to developing the BLU DELTA AI engine from a prototype into a leading quality AI product for invoice recognition!