One of the key aspects of OCR-based invoice data extraction is to be able to automatically detect which words and phrases returned by OCR are meaningful. The solution should only return key invoice data in a standardised format. Learn how the Naive Bayes algorithm can help you achieve this goal.

There are many tools in the developer’s toolbox when it comes to automatic data extraction. A good example is TF-IDF algorithm (Term Frequency – Inverse Document Frequency) which helps the system understand the importance of keywords extracted using OCR. Here’s how TF-IDF can be used for invoice and receipt recognition.

Invoices are arguably the most common documents related to B2B transactions. They’re used to reclaim VAT, claim business expenses, and as evidence for the amount of taxable income. And thus, all budding entrepreneurs must educate themselves on what commercial invoice elements are crucial for these purposes.

You might have noticed that most invoices include sequential numbers typically referred to as a document or invoice number. Invoicing software put numbers on documents by default but are they really necessary? Here’s what you need to know about invoice numbers.

Data extraction tools are handy for all accountants to make sure they don’t waste their time on manual data entry. To make it possible, you need a need to understand what OCR is and how it works to read and understand expense-related documents. Read on as we reveal the intricacies of how arbitrue extracts data […]

Welcome to arbitrue’s blog, your source of news and comment on the latest trends in technologies for accountants and bookkeepers. Read on to learn how arbitrue helps you deal with one of the most mundane and time-consuming tasks: invoice and receipt processing.

show more posts