Automate Health Care Information Processing With EMR Data Extraction - Our Workflow
We dive deep into the challenges we face in EMR data extraction and explain the pipelines, techniques, and models we use to solve them.
Invoice processing OCR (optical character recognition) software allows you to automate the process of handling an invoice or receipt, extracting key data, inputting the data into the proper database locations, and sending documents wherever they need to go. Old school automated systems use rule-based engines and pattern matching to try to extract the required data, or require specific templates that are the only invoice formats the system works with.
As you can imagine rule-based engines, open source OCR tools, pattern matching, and exact invoice formats do not allow you scale your automated invoice processing OCR technology at all! Any time we want to make changes to invoice format or image quality (normally seen with receipts) we have to make adjustments to our system. Although these old school systems are better than manual invoice processing, the automated invoice processing systems available today have been around for awhile and don’t get the job done.
The accuracy that the most popular OCR tools can achieve just does not work for a business setting. The most accurate OCR API from these large providers is Google Vision coming in at just 80% accuracy. From there it’s a quick drop off to 65% from Microsoft Cognitive Services & an abysmal 21% from AWS Rekognition (Source).

On top of that you need to invest in custom training and deep learning architecture to improve the accuracy for your specific use case. Given the fact that automated invoice processing is more than just extracting the text from the document but understanding entities and key data points we’re going to need something more than available OCR models.

Depending on what your invoice processing use case is used for requires custom steps to be built in around the OCR and deep learning models. Not only do these steps allow you to automate business steps specific to your needs but do so in a way that does not affect the performance of the deep learning and OCR models. These models are not built to handle the business processes and require more software development to reach a workflow that works for you.
We’ve built a custom deep learning based pipeline that allows you to automate your receipt and invoice processing OCR instantly on a huge range of formats with the highest accuracy available. This pipeline becomes a module in your workflow process that allows you to customize exactly how you go from input document to extracted and stored data. Don’t worry about digitisation, standardization, or anything else that slows down your workflow. We allow you to go from unstructured documents to extracted data with labeled entities instantly with our state of the art algorithms.
.png)


Our deep learning models work out of the box for a huge range of invoice and receipt formats and has been specifically trained on invoice examples from the 10 leading invoice processing companies including:
By focusing on the most popular invoice types and real examples from businesses we’ve been able to build a system that produces high accuracy right out of the box and allows enough flexibility in its architecture to be adjusted for specific business use cases. Our deep learning expertise mixed with an understanding of company SLAs allows us to perfectly design a flexible solution that fits right into business workflows.
Worried your exact invoices or receipts don’t work with the default invoice processing OCR or that the information is too hard to extract from the documents (bad handwriting, poor lighting etc)? Do you have custom fields that aren’t supported by out of the box solutions? As you can imagine by the number of different use cases possible this is pretty common.
Width.ai will fully customize the default invoice processing models with fine tuning on your actual data. By showing the models your exact invoices and the fields you care about you can steer the models towards your specific use case and boost the default state of the art accuracy through the roof! This fine tuning is pretty standard and the customization is something we fully recommend to help you reach the highest accuracy.

While our model supports over 50 of the most common fields right out of the box, we’re well aware that many use cases have extra data fields to cover. Through our fine tuning process you can add any number of fields to extract that show up in your documents. These fields are added to the model output instantly in a few easy steps and can be extracted in JSON or table format.
Our models go way past simple text extraction and OCR for invoice field extraction. Deep learning models added to our pipeline allow us to add reasoning and entity relationships to the equation to extract deeper fields and information. Some of the fields we’ve extracted in a custom setting include:
1. Question and answer pairs in an invoice.
2. Multiple languages in the same document.
3. Handwritten instructions
4. Address
5. Emails
6. Dates and times
7. Produce a document summary (like this)
8. Classification of paper invoices
Manual data entry and cost reduction are by far the most valuable benefits of invoice processing software with ocr technology. The ability to automatically extract the information you care about from these documents gives you hundreds of paid hours back with the exact same accuracy or higher than humans.
We’ve built custom integrations to the tools you currently use for this process to allow you to fully automate each step. Grabbing the documents, scanning them in, extracting the data, storing the data, monitoring and alerting - all covered in a pipeline that runs in a few seconds not minutes.
Here’s a 5 step guide to how you can integrate this system into your business and start automating your manual invoicing and document management processes.

The first step in integrating OCR invoice processing is to understand the different requirements for your use case and how they fit together. This can usually be done without a full understanding of what the actual invoice processing module looks like, as it’s more important to understand inputs and output requirements such as:
1. What fields you need to extract
2. How your specific system needs the output (Your CRM, JSON, DB, ERP System)
3. What alerts and confidence metrics you care about
4. How many invoices you process per month
5. How many historical invoices you have (Can be zero!)
By understanding the requirements for coming into a system and what you expect out it’s much easier to gameplan the part in the middle! Sometimes this process is as simple as putting on paper the manual steps you currently work through and how each one of them can be automated. For instance manually plugging in the product information from an invoice into Quickbooks becomes extracting fields with Width.ai and automatically updating your backend Quickbooks via API.
Deciding how you want to pass your invoices or receipts into the system based on how you currently store them is a huge part of the workflow design. Although the actual invoice processing OCR models will process a single document at a time, the software can be deployed to batch documents and run much higher volumes at a single time.

Output processing may seem like a part of the process that isn’t particularly complicated or important, outputs just get passed to our target system right? While it can be as simple as creating a CSV and storing it in a database, oftentimes businesses want an output processing system that provides so much more than that.

Now that you’ve got a high level overview of what you’re looking for the system to accomplish, what goes into it and what you need out of it we can take a look at how to structure the invoice processing. First let’s lay out a few important requirements to note:
1. Do we have fields that are not supported by the default state of the art architecture?
2. Should we finetune the architecture to upgrade the accuracy for our specific use case?
3. Width.ai uses cloud based architecture to deploy your invoice processing models with high runtime speeds.
4. Do we have past examples of invoices or receipts that we can use to quickly boost results?
5. Are we leveraging Width.ai’s custom NLP pipelines in our use case?
Understanding where you want to go with these helps ensure that the production process achieves the best results.
Keeping up with your invoice processing software is way more than just monitoring if the system is online. Modern deep learning based systems offer way more insight into every piece of what is happening 24/7 in your system. You can quickly integrate alerts, confidence scores, and notifications into:
Receive real time alerts for anything from system downtime to low accuracy results. Alerts can be integrated into a number of different systems to help you stay on top of your powerful ocr invoice processing.

Use in-house built confidence scores designed to help understand computer vision and natural language processing models in production. These scores work to understand the confidence based on known entity reasoning which far outperforms raw OCR confidence.
Notifications sent right to slack or email to let you know documents were processed successfully.
Our cloud infrastructure allows you to run documents through in seconds not minutes. Deploy our default out of the box model or custom solution with an API that allows you to access the workflow from any web application or backend system.
Deployed along the advanced ocr invoice processing solutions is an optimization system that automatically improves your pipeline over time with more data. This allows the deep learning models to become more attuned with your specific use case and be constantly improving.
You’ve deployed invoice data capture software that allows you to remove manual data entry and reduce costs by over 50%. Once your system is up and running it’s even easier to make changes such as add fields or new integrations for input or output.

The level of customization we offer for our invoice processing software mixed with the incredible accuracy our models provide gives you results you have to chase with other systems. Raw OCR based systems don’t give you the accuracy when looking to automatically extract data and prebuilt solutions won’t provide you the customization that real production systems require. Our accuracy at all stages of the process not only outperforms open source tools but also outperforms other prebuilt solutions that don’t allow for customization.

Setup a demo to see how you can use Width.ai invoice processing!
