What is BLU DELTA Learn API?

BLU DELTA API is an AI-based document capturing solution. It requires samples to be trained. The Learn API offers a REST interface to provide sample documents with or without groundtruth data for more than 10 different documents.

The interface enables customers to improve recognition quality by uploading sample data to the BLU DELTA controlled, active learning training pipeline.

Current Major Version: V1

Get Started

  1. Prerequisite: API Key

  2. Contact your account manager or bludelta

 

Provide Training Data and Ground Truth 

Training data consists of documents (input) and the expected information you want to receive from our BLU DELTA AI (ouput) also known as Ground Truth. The BLU DELTA AI will learn and/or improve the correct extraction based on these training samples (input and expected output). 

BLU DELTA’s Learn API offers the possibility to upload training data programatically and to automate the training process. E.g. if you notice manual corrections in your workflows and send back these corrections to the Learn API then the AI model will learn and improve over time.  

How to provide training data? 

The provided zip file can consist of document(s) only or documents including the Ground Truth. 

By default the Ground Truth must be provided in the same json format as the response of our BLU DELTA API but with “score=1”. Additionally only correct and true values must be provided. 

Sample 

E.g. if you want to provide the Ground Truth of the Currency and InvoiceDate related to our Musterrechnung.pdf below: 

You need to create following request:

https://learn.bludelta.ai/v1/Package 

Package=zip with following files in root dir: 

  • Musterrechnung.pdf 
  • Musterrechnung.json 

The Musterrechnung.json looks like: 

{

“InvoiceDetailTypePredictions“: [

{

“Type“: 64,
“TypeName“: “InvoiceDate“,
“Text“: “25.03.2020“,
“Value“: “2020-03-25“,
“Score“: 1.00,
“X“: 1815,
“Y“: 332,
“Width“: 190,
“Height“: 29,

},

{

“Type“: 524288,
“TypeName“: “InvoiceCurrency“,
“Text“: “EUR“,
“Value“: “EUR“,
“Score“: 1.00,
“X“: 0,
“Y“: 0,
“Width“: 0,
“Height“: 0,

}

]

}

Note:  MUST attributes are in BOLD  

Optional: 

  • Highly recommended attributes: x,y,width,height (if available, coordinates to the pixel information from the original image or from a pdf to png conversion). 
  • Text attribute: is the original text detected by the OCR.    

Training Data for Document Classification 

Training Data for Document Type classification does not require any json file. However, in the zip file only documents of the same type can be provided. The request must have set the DocumentType parameter. 

Training Data without Labels 

If you just want to provide documents without Ground Truth then you can just upload the zip file with documents but without json file. 

Optional: Training Data with other formats 

The Learn API includes a data mapping mechansim. This means that you can provide data in other formats e.g. csv file, etc. However, in this case we need to configure the mapping upfront. Pls contact your account manager if you want to use this option.  
  
Before you start uploading data please talk to your account manager or contact our support:   bludelta-support@blumatix.com