Cut Invoice Processing Time to Just 3 Seconds With This Invoice OCR Tool
iscover an invoice OCR tool that will revolutionize the way you handle invoices. There’s no human intervention needed & a dramatically lower per-invoice cost.
Radiologists and clinicians diagnose conditions from one or more modalities of biomedical images like X-ray radiography, computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), ultrasound, and others.
It's known in healthcare circles that diagnosis based on medical images takes time and complicated cases take longer. You may sometimes need multiple specialists to analyze more than one modality to reach a consensus.
Machine learning in medical imaging can improve decision making and diagnosis time by providing reliable clinical decision support to your busy specialists. Machine learning systems are capable of large-scale analysis and triaging, processing thousands of images in minutes.
Let's look at some approaches, network architectures, and uses of machine learning in medical imaging.
Image classification is the task of applying one or more medical labels to an image based on visual characteristics like colors, textures, objects, and shapes.
A convolutional neural network (CNN) is the most popular machine learning technique for image classification. A CNN consists of layers of convolutional filters. Each filter behaves like a neuron that lights up when it sees a particular texture or shape or other image feature but remains inactive otherwise.
Training a CNN on images of a particular class — say heart X-rays — habituates these filter neurons into adjusting their convolutional weights to strongly activate whenever visual features unique to heart X-rays are shown. When a heart X-ray is shown, a large number of these filters switch on and output values indicate that the image is a heart X-ray. This is the basic working of activation and pattern recognition in CNNs.
The outputs from convolutional layers are called feature maps and generating them for an image is called feature extraction. An image's feature maps are important inputs to every biomedical imaging task including classification, object detection, and segmentation.
Plain CNNs that consist of multiple convolutional layers suffer several problems like vanishing gradients and poor generalizability. To improve them, better architectures have been developed.
ResNet is one such popular architecture that uses skip connections between convolutional layers so that inputs are not just from the previous layer but also the layer before that. This helps preserve gradients in very deep networks. ResNet is a good choice when you have big data sets for training.
DenseNet is another popular architecture that connects each convolutional layer to every other. A set of visual features detected in an earlier layer influences all subsequent layers not just indirectly (which is true for plain CNNs too) but also directly. It performs better by reusing features instead of recalculating.
Recently, state-of-the-art visual transformer architectures have been favored over CNNs because they can handle global features and long-range dependencies better.
Let's now explore some uses of image classification in clinical practice.
Intracranial hemorrhage is any kind of bleeding due to accidents or violence inside the cranial space that protects the brain. There are five types and they are frequently detected in emergency wards from non-contrast computed tomography (CT) radiology scans of the head. As CT scanners are found in most emergency wards, detecting hemorrhages automatically using artificial intelligence can speed up triaging time.
The main difficulty in classifying CT scans is that the data is 3D. One solution is to use regular CNNs with 3D convolutions to extract features.
The extracted feature vectors are sent to a fully connected softmax layer for classifying the condition. Multi-class classification identifies one of the five types of hemorrhages. Some architectures opt for five parallel layers, each doing binary classification for one of the five types.
Another approach is slicing the 3D data into 2D images and using regular CNNs. But then the 3D spatial context of the condition is lost when the data is sliced. Can this context be retrieved somehow? That's exactly what a study tried by combining a CNN with a recurrent neural network (RNN). The CNN encodes spatial characteristics of each image. If the slices are supplied in the correct sequence, the RNN can encode visual characteristics spread across the sequence of slices.
Both studies reported sensitivity and specificity comparable to experienced radiologists and sometimes better than less experienced radiologists. For intracranial hemorrhage, the best model scored 0.99 on sensitivity close to the 1.0 of a senior radiologist and better than the average 0.94 of three junior radiologists. So these systems can be deployed in real-world emergency settings where triage time is critical.
Deep learning's ability to detect bone fractures in large gross anatomical features like hips is impressive. These are whole image classification tasks using convolutional neural networks and since most new ideas in deep vision are first implemented for classification, these tasks benefit whenever a better CNN architecture comes out.
Image classification has already helped find hip fractures. These are usually diagnosed from frontal pelvic X-ray radiographs. However, to avoid a misdiagnosis, patients are advised to get additional scans, which increases costs, delays treatments, and is impractical in remote areas without radiology facilities.
A deep learning system that can accurately detect hip fractures can solve these problems. Since they are more attentive to little visual details that people can miss, they will hopefully perform better or as good as human experts, acting as decision support.
One study used DenseNet CNNs for hip fracture classification. Their training used augmentation operations like small translations, rotations, and shearing to expand the training set. They also preprocessed the images using histogram equalization.
The study reported that:
Object detection is the task of locating one or more objects, belonging to one or more classes, in an image and calculating their bounding boxes.
Convolutional neural networks are preferred for medical field object detection too. Object detection can use any CNN architecture for feature extraction and appends classification and regression layers to predict object classes and coordinates. More recently, visual transformers have been tried too.
Some popular CNN-based detection architectures are:
YOLO and SSD are single stage detector architectures that classify and locate multiple objects using a single network. In contrast, R-CNN family architectures contain two sub-networks — one comes up with region proposals that possibly contain objects while the other classifies them and predicts their locations.
You can choose the proven and mature Faster R-CNN for most of your medical detection tasks. It's fast enough to output results in real-time, giving you a more efficient workflow.
It first extracts features using a backbone CNN that you specify. Then a sub-network called the region proposal network (RPN) examines the extracted features and tells the main network where to look for objects by proposing region rectangles with their confidence scores. The RPN is also a CNN that’s fully convolutional without any dense layers and shares layers with the main R-CNN network.
RPN's region proposals are routed through a region of interest (RoI) pooling layer to reshape them before passing to a fully connected layer that predicts the class and coordinates.
Inception-ResNet that combines ResNet-50 and inception modules is a great choice for the backbone CNN. ResNet allows for very deep networks without running into the vanishing gradient problem. Inception aims for a computationally lighter network than a regular CNN by using 1x1 convolutions for fewer parameters. Their combination gives you a very deep, computationally light network to calculate feature maps.
In busy emergency wards, surgeons and radiologists may focus more on trauma injuries and therefore miss fractures. Artificial intelligence systems that can quickly detect possible fractures in X-ray radiographs can be a big help in such high-pressure environments.
One approach is to use an object detection network to detect the fractures directly. However, because they generally operate at lower resolutions, they are more suitable for detecting large objects rather than inconspicuous fractures hiding in a large image.
A better approach is to use detection to first locate musculoskeletal parts of interest — such as wrists — and then pass those small regions to a second fracture classification network. Since this second network examines only small areas and not the full image, its accuracy will be better.
We’ve already gone over the Faster R-CNN architecture. Let’s look at the Inception-v4 classification network used here.
Inception architectures solve the vanishing gradient issues of very deep neural networks using inception blocks which consist of a large number of convolutional filters stacked not vertically but horizontally. So Inception behaves like a very deep network but by going wide instead of deep with fewer network parameters.
The imaging data is manually labeled by experienced orthopedists using labeling tools like LabelImg.
Synthetic images are generated using augmentation operations like horizontal flipping, random translations, rotations, shearing, and scaling, all within fixed limits. If there’s a chance that the images can come from different X-ray machines, you should use normalization techniques like histogram equalization too.
If you have large datasets after augmentation (thousands of images), you can train the network from scratch. But if you have just a few dozen or hundreds, then you should use transfer learning methods where a pre-trained Faster R-CNN model is fine-tuned by unfreezing its final layers and retraining them on your X-ray training data. Use best practices like keeping test and validation data subsets apart. The fracture classifying Inception model is trained the same way.
The study compared this machine learning method’s performance with those of experienced orthopedists and radiologists using metrics like accuracy, sensitivity, specificity, and Youden index. Amazingly, they found that the system outperformed radiologists and performed at par with orthopedists.
Object detection can automate routine analysis of dental periapical and bitewing X-ray radiographs such as:
You can opt for a pre-trained Faster R-CNN machine learning model fine-tuned for these tasks using transfer learning.
Dental radiographs are high-resolution images that can be safely downscaled without reducing detection accuracy. However, different X-ray machines produce images with different contrasts, which affects accuracy. For that, you should normalize contrasts using image processing techniques like contrast-limited adaptive histogram equalization (CLAHE) that equalizes contrasts in local regions without adding noise..
A Faster R-CNN generalizes better with more data. You should augment training images with additional images using operations like horizontal and vertical flipping, adding random noise, and making random contrast modifications.
As radiograph datasets tend to be small, transfer learning is the best approach to train such a deep architecture. Start with a pre-trained model like the Faster R-CNN Inception ResNet V2 that's trained on the COCO dataset. Unfreeze only its final layers and retrain it on your teeth dataset to fine-tune it for dental features. Transfer learning performs well because textures and shapes have already been learnt by the pre-trained model.
Use standard object detection metrics like mean average precision (mAP) and intersection over union (IoU) to evaluate your fine-tuned model. Similar models have reported mean IoU, precision, and recall of 90% and above.
Image segmentation is a frequently used computer vision task in medical image analysis. It involves isolating regions of medical interest in natural tissues. It's used in every medical field with every modality — breast cancer and lung cancer detection, Alzheimer's disease classification, and nerve detection are just a few examples.
Since regions have irregular shapes, segmentation has to classify — i.e., assign a class label for — every pixel in the image. For example, an oncology MR image can contain regions of healthy tissue, benign lesions, and malignant tumors.
Let's explore two popular segmentation neural networks — U-Net and FC-DenseNet.
U-Net is a popular deep CNN architecture developed by a medical research team for medical image segmentation. Its name comes from depicting its architecture in the shape of a “U,” consisting of:
Since U-Net is a fully convolutional network (FCN) with no dense layers at all, it can accept images of any size. The only purpose of the encoding layers on the left is feature extraction at every resolution to pass to corresponding upscaling layers on the right.
The deconvolution layers on the right iteratively upscale pixel masks by deconvolving features from the previous layer with features from its corresponding encoding layer. The result is a pixel mask that's the same size as the input image.
During training, RGB images are input as rank-4 tensors. The ground truths are segmentation maps for each image where each pixel is labeled with a numeric class index. Since a segment map contains multiple regions, this is a typical multi-class classification at the pixel level and hence uses cross-entropy as the loss function.
However, because you're classifying a large number of pixels, you need to optimize at the aggregate level too so that most pixels match their ground truth labels. For this, the Dice coefficient for set similarity is included in the loss function along with cross-entropy.
The effectiveness of segmentation models is evaluated using the Jaccard similarity score between ground truth regions and predicted regions.
FC-DenseNet is another segmentation network that uses DenseNet as a feature extractor. The main intuition behind DenseNet is that directly connecting every layer to every other layer makes the network easier to train and lighter with fewer parameters.
Like U-Net, FC-DenseNet is also a fully convolutional U-shaped architecture with a downscaling path and an upscaling path consisting of dense blocks. Each dense block is a set of convolutional layers where each layer is connected to every other layer.
In the downscaling path, each dense block's input and output feature maps are concatenated. Thus there is a linear growth as well as reuse of feature maps as one moves down. However, in the upsampling path, it's not a good idea to expand the feature maps while the spatial resolution is also expanding. If that happens, the final softmax layer has to contend with an intractable number of features.
But you still want to reuse already calculated feature maps. So in the upscaling path, only the last dense block's feature maps are input to the deconvolution layer. Since full feature maps were already calculated in the downsampling path, they are supplied from the corresponding encoding layer to the deconvolution layer through skip connections. This is where it differs from U-Net which uses multiple deconvolution layers and combines all feature maps at every layer.
Cardiomegaly is an enlarged heart condition that often indicates a more serious cardiovascular disease. Since chest X-ray radiographs are easily available, automated flagging of possible cardiomegaly in chest X-ray radiographs can save triaging time for medical personnel.
One indicator of cardiomegaly is if the cardiothoracic ratio (CTR) — the ratio of heart width to lung width — is above 0.5 instead of being in the normal range of 0.39-0.5 with an average of 0.45.
Cardiomegaly can be detected by segmenting the heart and thoracic cavities and measuring CTR. One study did this using both U-Net and FC-DenseNet and compared their results. They found that while both performed well, U-Net showed better accuracy and precision while DenseNet showed better recall. Compared to U-Net, DenseNet made fewer mistakes in labeling people who had cardiomegaly as not having it. As cardiomegaly is an indicator of underlying disease, it would be a bad idea to not detect it when it's present. So DenseNet is the safer network from a healthcare point of view.
Since chest radiographs are likely to be a small dataset, you should use data augmentation techniques like slight rotations, shearing, shifting, and zooming to expand the training set with synthetic images. Additionally, since these are soft tissues, you can use elastic deformations to further expand the training set and help your network generalize better.
A stroke lesion is a region of the brain where brain cells are dead due to lack of sufficient blood flow and can cause death or permanent disability. Neurologists detect stroke lesions from 3D magnetic resonance images (MRI) of the brain. MR images can be obtained through multiple modalities:
Often, lesions show up in one or more of these modalities. You can use a volumetric segmentation architecture like 3D U-Net to automatically detect lesions in such multimodal MRIs. Analyzing the 3D voxels directly ensures there's no loss of local information which is a problem when analyzing them as 2D slices.
3D U-Net is just a 3D version of normal U-Net that accepts 3D volumes as inputs and uses 3D convolutions and pooling. For multimodal MRI data, an input set of 3D MRIs is a rank-5 tensor that’s passed through 3D convolutional layers to extract features.
One problem you'll face is a class imbalance in the data because most areas across all images will be healthy tissue while only a small set will be damaged lesions. This can be solved by using a dynamically weighted loss function like focal loss so that the network is less biased towards confident classifications and more biased towards misclassified examples.
One study used this architecture and loss function on multimodal MRI data and reported Dice similarity as high as 0.84 while scoring high on other metrics like sensitivity and positive predictive value (PPV).
Machine learning in medical imaging is becoming smarter every day, offering you several opportunities to improve operational efficiency in your healthcare company, hospital, or laboratory. Contact us to learn how you can benefit!
Instead of invoice matching taking upwards of a week, it could take mere seconds with the proper automation solution. Learn more here.
Manual and template-based invoicing are riddled with low accuracy and required human intervention. Learn how to systematically eliminate these issues with the right invoice data capture software.
A complete walkthrough guide on how to use visual search in ecommerce stores to create more sales and real examples of companies already using it.
Automating the extraction of data from invoices can reduce the stress of your accountants by finding inaccuracies, digitizing paper invoices, and more.
How you can use machine learning based data matching to compare data features in a scalable architecture for deduping, record merging, and operational efficiency
Learn how lifetime value or LTV prediction can improve your marketing strategies. Then, discover the best statistical & machine learning models for your predictions.
A deep understanding of how we use gpt-3 and other NLP processes to build flexible chatbot architectures that can handle negotiation, multiple conversation turns, and multiple sales tactics to increase conversions.
The popular HR company O.C. Tanner, which has been in business since 1927 and has over 1500 employees, was looking to research and design two GPT-3 software products to be used as internal tools with their clients. GPT-3 based products can be difficult to outline and design given the sheer lack of publicly available information around optimizing and improving these systems to a production level.
We’ll compare Tableau vs QlikView in terms of popularity, integrations, ease of use, performance, security, customization, and more.
With a context-aware recommender system, you can plan ways to recreate some of the contextual conditions that persuade them to buy more from you.
We’re going to walk through building a production level twitter sentiment analysis classifier using GPT-3 with the popular tweet dataset Sentiment140.
Discover ways that machine learning in health care informatics has become indispensable. Review the results of two case studies and consider two key challenges.
Accelerate your growth by pivoting key areas of your business to AI. Your business outcomes will be achieved quicker & you’ll see benefits you didn’t plan for.
We built a GPT-3 based software solution to automate raw data processing and data classification. Our model handles keyword extraction, named entity recognition, text classification | Case Study
We built a custom GPT-3 pipeline for key topic extraction for an asset management company that can be used across the financial domain | Case Study
How you can use GPT-3 to create higher order product categorization and product tagging from your ecommerce listings, and how you can create a powerful product taxonomy system with ai.
5 ways you can use product matching software in ecommerce to create real value that raises your sales metrics and improves your workflow operations.
Data mining and machine learning in cybersecurity enable businesses to ensure an acceptable level of data security 24/7 in highly dynamic IT environments. Learn how data security is getting increasingly automated.
Product recognition software has tremendous potential to improve your profits and slash your costs in your retail business. Find out just how useful it is.
Big data has evolved from hype to a crucial part of scaling your organization in every modern industry. Learn more about how big data is transforming organizations and providing business impacts.
Learn how natural language processing can benefit everybody involved in education from individual students and teachers to entire universities and mass testing agencies.
Here’s how automated data capture systems can benefit your business in some key ways and some real-life examples of what it looks like in practice.
Use these power ai and machine learning tools to create business intelligence in your marketing that pushes your business understanding and analytics past your competition.
We built a custom ML pipeline to automate information extraction and fine tuned it for the legal document domain.
In this practical guide, you'll get to know the principles, architectures, and technologies used for building a data lake implementation.
Find out how machine learning in biology is accelerating research and innovation in the areas of cancer treatment, medical devices, and more.
An enterprise data warehouse (EDW) is a repository of big data for an enterprise. It’s almost exclusive to business and houses a very specific type of data.
Save yourself the hassle of manually importing and processing data with intelligent document processing. Learn all the details of how it works here.
Dlib is a versatile and well-diffused facial recognition library, with perhaps an ideal balance of resource usage, accuracy and latency, suited for real-time face recognition in mobile app development. It's becoming a common and possibly even essential library in the facial recognition landscape, and, even in the face of more recent contenders, is a strong candidate for your computer vision and facial recognition or detection framework.
Learn how to utilize machine learning to get a higher customer retention rate with this step-by-step guide to a churn prediction model.
Machine learning algorithms are helping the oil and gas industry cut costs and improve efficiency. We'll show you how.
We’ll show you the difference between machine learning vs. data mining so you know how to implement them in your organization.
Here’s why you should use deep learning algorithms in your business, along with some real-world examples to help you see the potential.
Beam search is an algorithm used in many NLP and speech recognition models as a final decision making layer to choose the best output given target variables like maximum probability or next output character.
Best Place For was looking for an image recognition based software solution that could be used to detect and identify different food dishes, drinks, and menu items in images sourced from blogs and Instagram. The images would be pulled from restaurant locations on Instagram and different menu items would be identified in the images. This software solution has to be able to handle high and low quality images and still perform at the highest production level, while accounting for runtime as well as accuracy.
Deep learning recommendation system architectures make use of multiple simpler approaches in order to remediate the shortcomings of any single approach to extracting, transforming and vectorizing a large corpus of data into a useful recommendation for an end user.
GPT-3 is one of the most versatile and transformative components that you can include in your framework, application or service. However, sensational headlines have obscured its wide range of capabilities since its launch. Let’s take a look at the ways that companies and researchers are achieving real-world results with GPT-3, and examine the untapped potential of this 'celebrity AI'.
Let's take a look at how you can use spaCy, a state of the art natural language processing tool, to build custom software tools for your business that increase ROI and give you data insights your competitors wish they had.
The landscape for AI in ecommerce has changed a lot recently. Some of the most popular products and approaches have been compromised or undermined in a very short time by a new global impetus for privacy reform, and by the way that the COVID-19 pandemic has transformed the nature of retail.
Extremely High ROI Computer Vision Applications Examples Across Different Industries
Building Data Capture Services To Collect High ROI Business Data With Machine Learning and AI
Software packages and Inventory Data tools that you definitely need for all automated warehouse solutions
Inventory automation with computer vision - how to use computer vision in online retail to automate backend inventory processes