A Deep Guide to Text-Guided Open-Vocabulary Segmentation
Discover the power of text-guided open-vocabulary segmentation using large language models like GPT-4 & ChatGPT for automating image and video processing tasks.
Document classification is a common task in business as every document has to undergo some business workflow, and sending it the wrong way can be expensive, especially in regulated industries.
In this article, we explore accurate classification algorithms using the latest innovations in deep learning, computer vision, natural language processing (NLP), and machine learning models.
Document classification is a machine learning task to identify the class or type of document. For example, given a large set of scanned documents, your business may need to sort them into invoices, receipts, contracts, pay slips, and expenditure reports. The types of documents are often domain- and task-specific.
A class is determined based on a document’s text content, visual features, or both. Other aspects like metadata, location, or file name may also determine its class.
Two common concepts related to document categorization are:
In the following sections we’ll dive into various automatic document classification techniques.
Since we’ll refer to transformers often in this article, here’s a short overview of them.
Transformers are a family of deep neural networks designed for sequence-to-sequence tasks (e.g., language translation or question-answering). Using techniques like self-attention and multi-head attention blocks, transformers understand the long-range context in a text (or other data) and scale well during training.
A transformer consists of an encoder network or a decoder network or both. You can train and use them separately or as a single end-to-end network. An encoder generates rich representations, called embeddings, from input sequences. A decoder combines an encoder’s embeddings with output from previous steps to generate the next output sequences.
GPT-3 provides us with a pre-trained model that lets us leverage a baseline task (unsupervised document classification, sentiment analysis, summarize, etc.) agnostic understanding of language with a guided understanding of classifying text via few shot learning and fine-tuning. GPT-3 provides us with a few key benefits for long document classification over other architectures.
We’ll use this model as a part of a pipeline for classifying long documents (20+ pages) leveraging our proven long document GPT-3 pipeline. We can also use this few-shot environment to dramatically speed up the manual document classification process required to create training data.
We’ve designed an architecture that allows us to scale our long document classification to longer documents without needing to adjust input size for large variance between different input documents.
The text extraction module is used to extract document text from PDFs, images, and Word documents. While the most common use case is extracting text from unstructured documents where the text simply flows in a natural left to right direction, we can fine-tune this module to extract text in a more structured format based on the type of document. Documents such as legal documents, invoices, and other documents with tabular formats have special positional structures that should be accounted for and the text should be structured.
The goal of this module is to reduce our document text input size by removing information that is not relevant to classification. There’s a few reasons we do this:
This module will be a fine-tuned deep learning algorithm focused on understanding what information in our input text generally has the lowest correlation to the correct class. The more information we remove from the input text, the less future data variance we can cover. The tradeoff is that our downstream classification model can use more few-shot examples and understand our current evaluation dataset better.
Imagine we are reducing 1,600-word chunks to a single sentence theme. It will be harder for future inputs with new data variance to reach the same accuracy considering how much information we had to remove to reach a single sentence in most cases. The tradeoff here is we can include more examples in our classification model and can fit this exact dataset well. I recommend starting much wider in the amount of information we keep in this step when the dataset is small and data variance coverage is low.
Depending on the size of the document after preprocessing, you still might need to chunk the document. At a high level the chunking algorithm is an algorithm used to break apart the large document text to be classified in parts, then combined back together using an output algorithm that works to create a combined classification. These chunking algorithms can be as simple as just splitting the document up into smaller sizes based on a set number of tokens, or as intensive as using multiple NLP models to make decisions on where to split the document text based on keywords or context.
Prompt optimization allows us to dynamically build our GPT-3 prompts based on the given input text. The idea is to adjust the prompt language and prompt examples used in our prompt based on the input text. The prompt optimization algorithm chooses these based on a trained understanding of what specific prompt examples from a database lead to us having the highest probability of a successful output from GPT-3. This can be based on a keyword match, semantic match, or similar length.
The benefit of this algorithm is that we can provide GPT-3 with information that is relevant to our input text that better shows GPT-3 how to reach a correct classification for similar text. This method has been shown to improve the results of GPT-3 models up to 30% in some tasks! It makes sense that GPT-3 would be able to better classify a basketball box score if the few-shot examples provided are also box scores instead of a static prompt that has a bunch of different text examples.
Fine-tuning the davinci model is a great way to steer the task agnostic GPT-3 model toward our classifying documents task. Fine-tuning does not have a prompt token limit which means we can provide GPT-3 with a ton of training examples showing how to classify our long documents. Each training example is prompt-sized and allows us to stuff much more of our preprocessed document text into each example then we will have in our runtime prompt with everything we’ve seen so far. It’s best practice to make your fine-tuned examples similar to what the model will see at runtime, in terms of length, especially if you’re going to forgo the runtime few-shot examples and just rely on the fine-tuned model.
Fine-tuning won’t provide as much value as prompt optimization until we reach a certain number of training examples per document class. With fine-tuning, no single training example is leveraged heavily by the model when classifying an input document whereas few-shot learning examples in the prompt are the focus of how the model understands completing our task correctly. This is why it's better to use few-shot learning while the dataset has low data variance coverage and doesn’t give deep context for each document class.
We have two main tasks that help us clean up and create our results for classifying these long documents.
Custom built confidence metrics allow us to put a model confidence value on the outputs of our GPT-3 model. This is an incredibly useful tool to gain insight into how the model classified the document, and allows you to perform real-time tasks such as regenerate poor results, use user feedback, and understand how well your document classification model is performing in production.
Once we’ve generated our classifications for each chunk of the long form document we need to combine them to create a single document classification. There’s a number of different ways we can do this, and similar to prompt optimization, there isn’t one best fit way for every use case. The most common way to create a single document classification is to simply choose the class that was chosen for the most chunks, and return a confidence value based on how many chunks are chosen out of the entire set. A more complex version of this is to evaluate each chunk's confidence score and chunk size of the entire document. With this, the larger chunks are more valuable to the equation.
In this section, we explore bidirectional encoder representations from transformers (BERT) for document classification tasks.
BERT is best thought of as an approach for training transformer encoders for language tasks. It proposes special techniques at training time like:
These techniques turn an encoder into a versatile pre-trained language representation model that you can quickly fine-tune for specific language tasks.
The BERT research paper also provided two transformer encoders that were trained using this methodology, called BERT-large and BERT-base. Depending on the context, BERT may refer to either the approach or the pre-trained models.
BERT does not propose any new network architecture and just reuses the original transformer encoder architecture. Its capabilities stem from its training strategies. The two pre-trained BERT models use the same architecture but differ in their internals:
The DocBERT model fine-tunes both BERT models for document classification. It does this by attaching a simple, fully connected, softmax layer that reports the probabilities of classes for an input embedding.
The input to the softmax is the final hidden state corresponding to the [CLS] input token that marks the start of a sequence. This hidden state acts as a latent representation of the input sequence, making it useful for classification tasks.
This classification model (transformer encoder + softmax) is fine-tuned end-to-end on training datasets like:
Although both BERT models achieved the best F1-scores on all four datasets, their enormous sizes made them expensive and slow for both fine-tuning and inference. The next best model achieved comparable F1-scores (usually within 3-4% of both BERTs) and inferred 40x faster with less than 4 million parameters. The inefficiency of the BERT models is unacceptable for many use cases.
Can BERT’s awesome capability be transferred to a lighter model to achieve both high accuracy and performance? The DocBERT paper explores algorithms like knowledge distillation (KD) to transfer DocBERT-large’s capability to a lightweight bidirectional long short-term memory (BiLSTM) network.
The BiLSTM is first trained normally on a labeled dataset. DocBERT-large is also fine-tuned on the same dataset. Since the latter’s F1-scores are higher, it’s designated as the teacher and the BiLSTM as the student.
Next, a transfer dataset is created, and the class probabilities inferred by DocBERT on it are set as soft targets for the student. The student aims to fine-tune its trained weights on this transfer dataset so that it matches the teacher’s class probabilities with the least error.
Using this technique, the KD-BiLSTM improved on its own baseline scores and got close to DocBERT-base’s scores while being 25x smaller and 40x faster than it!
The discussion so far gives the impression that BERT is only for language tasks. But that’s not true, and BERT has been used for tasks that combine computer vision and NLP. One such application is visual document understanding that replicates human understanding of complex documents like invoices, contracts, or court records.
Document classification using both visual and linguistic information is often needed. For example, process automation may have to classify documents to send them to different business workflows.
LayoutLM is a visual document understanding model that combines BERT pre-training with visual aspects of text blocks. Both aspects are combined as embeddings to the BERT encoder. The classification layer attached to it learns to identify the document using both visual and textual aspects, just like people do.
In this section, we explore techniques to overcome the limitations of pre-trained BERT models when processing long documents.
A drawback of all transformer models is that self-attention is quadratic to the sequence length. That’s why the pre-trained BERT models cap their input sequence length to 512 and truncate everything else because longer documents require quadratically higher computational power.
Another drawback is with the positional encoding scheme that blends position information into the input embeddings. It’s trained only for sequences under 512 items. For longer documents, it has to be retrained. So, in practice, 512 has become a hard limit of the BERT models.
Long documents like legal agreements or business plans have multiple sections. Reviewers may need high-level labels like “warning” or “safe” to help them focus on the critical sections.
Many text classification tasks like sentiment analysis may apply different labels to different sections in the same document. But this isn’t possible using BERT.
Hierarchical transformers solve this with a simple algorithm:
For their experiments, they use the smaller BERT-base model for efficiency. The LSTM used is a small network that produces 100-dimensional document embeddings. It’s called RoBERT, for recurrence over BERT. The second transformer is similarly a small one with just two transformer blocks. It’s called ToBERT, for transformer over BERT.
Both RoBERT and ToBERT are fine-tuned on three text classification datasets:
Arranging sequential networks in a hierarchy allows them to overcome their sequence length limits. The BERT-based model scored higher accuracies on some datasets over other support vector machine (SVM) and convolutional models.
Service level and other legal agreements can run into dozens of pages filled with dense legalese. To help reviewers save time, you can summarize their contents.
For higher confidence that nothing critical is being missed out, you can also run topic identification and sentiment analysis on each section and show them as section labels using hierarchical transformers. They help reviewers focus on the most critical portions of such documents.
Real-world document processing can be quite messy. Using a mortgage industry case study, we’ll see the type of problems that crop up in paperwork-intensive industries and explore how document classification solves them.
A loan audit involves reviewing a set of documents called the loan document package. A typical package can have hundreds of scanned pages like land titles, identity documents, income documents, signed declarations, and more. The pages are supposed to be arranged in a particular order to make them easy to process.
But in reality, they are often haphazardly grouped. Identifying and grouping loan documents is a major bottleneck for banks and mortgage companies. So they rely on business process outsourcing to automate some of it and complete the rest manually. But because the documents can be complex, mistakes aren’t uncommon. This raises the costs and time required for processing.
Some semi-automated techniques are in use out there but are largely unsatisfactory. Using document templates for parsing the information can be faulty and laborious. Custom rule-based pipelines fail when they run into edge cases. Once a pipeline makes mistakes, people stop trusting the entire pipeline and revert to manual verification.
The industry needs automated solutions that can robustly and reliably process most documents with little human involvement.
A clever solution to the problem is identifying just the typical starting and ending pages of each document type. They often have very unique layouts that are easily identified. Each document type will have two classes — “[type]-start” and “[type]-end.” If a page isn’t one of these start or end classes, then you just classify it as “other.”
Each scanned page is processed by an optical character recognition (OCR) engine to extract its text. Any unwanted text is discarded.
The page text is then processed by a doc2vec machine learning algorithm to produce a dense feature vector that represents all the text and its patterns on that page.
Using the feature vector, a logistic regression machine learning model infers the class of that page along with confidence scores and other metrics.
Logical rules are applied to the model’s output sequence to check if all pages of a document type are together. A pipeline like this reduces human effort considerably and the remaining edge cases can be easily managed by the outsourced staff.
As we’ve looked at above, the two main constraints in document classification are a high number of classes (especially relative to the number of examples we have per class) and having enough training samples to understand the difference between classes. It’s no surprise that these two constraints go together well to cause issues in production level document classification systems.
We’re going to look at a document classification pipeline that allows us to classify documents into a large number of classes in a constrained zero shot data environment.
Data labeling of documents can be an extremely expensive and time-consuming task. Industries that require a certain expertise to review documents such as legal or financial are even more expensive given the cost of these resources. Data labeling documents into a single class out of a large amount (high-class) is extremely time-consuming as many classes in this environment are very similar to each other, and small details in the documents are what leads to the variation.
Zero-shot classification is the perfect solution for working around these ideas. True zero-shot classification requires no fine-tuning and no prompt examples to guide the model to a correct output. We will provide the classes available, the input document we want to classify, and the prompt instructions to perform the task. Since we are not providing prompt examples the instructions we use will be very important, as they are the only information used to help the task agnostic GPT-3 model understand how to correctly complete our task.
Zero-shot learning methods are also a great way to speed up the data labelling process for a future fine-tuned or few-shot model. These models will almost certainly be more accurate than the zero-shot solution in the long term. If we understand that our zero-shot model is 80% accurate, we can use this to put a label on all documents we want to use in the future and have a manual reviewer quickly check them before training. This assisted review is much more efficient than full manual review.
Let’s look at an example of an NLP pipeline that leverages a GPT-3 model to perform zero shot classification.
This pipeline focuses on extracting information from documents in a format that provides us the most contextual information relative to the classes we have available. From there the prompt language and instructions are critical to be able to form relationships between document text and classes with little prior understanding of the relationship.
The key step to this pipeline is how we extract our text in a format that provides GPT-3 with an idea of what information is important. It doesn’t make much sense to extract all header, body, abstract, and other common document fields as the same unstructured text, given that different text clearly has varying value when used differently in documents. Downstream, we can tag important information in ways that tells GPT-3 that this text was more valuable in the document. This idea is the same as what you can do with tags such as <h1> when using marketing copy as a variable in a GPT-3 prompt (as seen here).
There are a ton of pre-trained architectures that you can leverage to extract text from documents in a better format that will allow us to assist zero shot GPT-3 in understanding what information is valuable to the provided document. Libraries like Kleister-NDA let you extract key entities from legal documents and start putting tags around key information without needing to fine-tune the model.
If you’re willing to fine-tune this text extraction module for better accuracy, leveraging architectures like LayoutLMv2 are perfect for this document understanding task. This architecture contains a spatial-aware self-attention mechanism into the Transformer architecture that allows the model to understand relative positional relationships among blocks of text.
The goal of this module is to prepare our extracted document text in a format that helps GPT-3 classify by reducing the amount of irrelevant information from our document text and applying tags based on the relationships learned in the previous step. We’ll use input preprocessing tools as simple as removing stopwords and fixing grammar, to complex summarization algorithms that focus on creating large extractive summaries that keep big amounts of the document.
A good zero shot GPT-3 prompt has a few key features that allow us to turn this task agnostic model into a document classification model.
These are used to provide GPT-3 with clear instructions on how to complete the task. This allows us to steer the task agnostic GPT-3 model towards our classification task and provide key information that helps differentiate classes such as what variables to focus on, what info in the document should be deemed valuable, and any other rules we believe are important.
Prompt language is used to provide GPT-3 context around what text is being used in the prompt. This can be variables, rules, or even tags that structure the information a bit more than you would otherwise have. We write Python code in this step that creates this prompt language and automatically adds it to our prompt when building the layout.
During the development process it’s best practice to split test a number of prompt language combinations with varying levels of granularity. Granularity means how specific you are when explaining what your input is. The risk with more granular prompt language is that it might not be completely correct across our entire data variance.
In the example above, I say “various text sources” which is less granular than saying something like “from blog posts” and even more granular “from blog post titles and abstracts.” But if our dataset contains text from blogs, research articles, and reports, the granular text would not line up well with the language differences across the sources. This means GPT-3 will try to apply the same rules across different types of text because we said in our prompt language that it is all the same.
The prompt language also includes our classes we want to use for classifying this document text. We can set up a prompt variable for this that simply lists the available classes in the prompt. I recommend providing a bit of context around what the class entails for each class, considering we’re using a zero-shot environment and it's difficult as is to correlate the input documents to classes. This can be as simple as a short description of what the class is alongside the keyword. We’ve seen that this extra information can go a long way in classification and we’ve used it for use cases like classifying products to the Google Product Taxonomy by leaving the upstream categories in when laying out what classes are available. It’s much easier to correctly categorize “Apparel & Accessories > Costumes & Accessories > Masks” than just “Masks.”
The text that was extracted and preprocessed from the previous steps is added to our prompt. In some use cases this can be from multiple sources.
Now that we’ve created our zero-shot prompt for high class document classification, we can process through GPT-3. If we have a fine-tuned model, we can leverage that instead of the base model. It might sound like it doesn’t make much sense to talk about fine-tuned models when we’re constrained to zero shot, considering we normally focus on zero shot when we don’t have enough data to use few-shot or fine-tuning. But there are a number of ways we can still leverage fine-tuning to increase our accuracy.
Using an existing document classification dataset to create a fine-tuned model can actually increase our accuracy in a different document classification use case with different classes. If the input documents are relevant and can be fed to GPT-3 in the same format we can leverage them as a way to show GPT-3 how to accomplish a similar task. This is a great way to get your zero-shot prompt off the ground and give the task agnostic model more of an understanding of your specific task in a transfer learning type setup.
A new method proposed by Google focuses on fine-tuning language models on various tasks phrased as instructions and then evaluating them on unseen tasks. The fine-tuning uses a number of different setups (zero-shot, few-shot, CoT) which allows for better generalization to these unseen tasks. This is a great way to give GPT-3 a better understanding of correlating task specific instructions to outputs and the dataset uses a bunch of classification use cases.
Here’s a quick overview of the 1,836 different tasks used for fine-tuning.
In the post processing stage, we can generate confidence metrics focused on understanding how confident GPT-3 is in the class that was chosen. We leverage the logprobs that are generated for each token and a custom algorithm that understands the correlation between logprobs and the model’s confidence in the output.
You explored some advanced techniques for document classification in this article, techniques that were invented to solve the real-world problems most industries face. Width.ai builds custom document processing solutions for use cases (just like these!) that you can leverage internally or as a part of your product. Schedule a call today and let’s talk about if document processing software is right for you. Contact us!
Discover the power of text-guided open-vocabulary segmentation using large language models like GPT-4 & ChatGPT for automating image and video processing tasks.
Learn how CLIPSeg segmentation, in combination with GPT-4 and ChatGPT, can enable diverse applications from medical image diagnosis to remote sensing.
Can GPT-4 make your life as a finance or banking employee easier? Learn how GPT-4 and NLP can be used in finance to increase revenues and streamline workflows.
A deep dive into how we reached SOTA accuracy in product similarity matching through a custom fine-tuning pipeline that refines the CLIP model for image similarity.
Boost your conversions and sales numbers with NLP in sales using OpenAI's GPT-3 and GPT-4. You can use chatbots to improve customer experience and loyalty.
Explore the use of GPT for opinion summarization through innovative pipeline methods, evaluation metrics like ROUGE and BERTScore, and human evaluation insights. Dive into novel entailment-based evaluation tools for a comprehensive understanding of model performance in capturing diverse user opinions.
Come aboard the large language model revolution with our deep dive on AI21 vs. GPT-3 for business use cases like ad copy generation and math proof generation.
A technical guide to using BERT for extractive summarization on lectures that outperforms other NLP models
Discover how prompt based LLMs like GPT-3 & GPT-4 are transforming news summarization with its zero-shot capabilities and adaptability to specialized tasks like keyword-based summarization. Learn about the limitations of current evaluation metrics and the potential future directions in text summarization research.
Discover the PEZ method for learning hard prompts through optimization, a powerful technique that enhances generative models for image generation and language tasks, improves transferability, and enables few-shot learning
Take a look at how Width.ai built 17 generative ai pipelines for use in the Keap.com marketing copy generation product
A deep look at how recurrent feature reasoning outperforms other image inpainting methods for difficult use cases and popular datasets.
See a comparison of GPT-3 vs. GPT-J, a self-hosted, customizable, open-source transformer-based large language model you can use for your business workflows.
Discover how transformer networks are revolutionizing image and video segmentation, and get insights on modern semantic segmentation vs. instance segmentation.
Discover how the state-of-the-art mask-aware transformer produces visually stunning and semantically meaningful images and how it stacks up against Stable Diffusion & DALL-E for large-hole inpainting
Unlock the full potential of spaCy with this guide to building production-grade text classification pipelines for business data.
We compare 12 AI text summarization models through a series of tests to see how BART text summarization holds up against GPT-3, PEGASUS, and more.
Let’s take a look at what intent classification is in conversational ai and how you can build a GPT-3 intent classification model for conversational ai and chatbot pipelines.
Discover the capabilities of zero-shot object detection, which enables anyone to use a model out-of-the-box without any training and generate production-grade results.
What is facial expression recognition and what SOTA models are being used today in production
Get a simple TensorFlow facial recognition model up & running quickly with this tutorial aimed at using it in your personal spaces on smartphones & IoT devices.
Learn what human activity recognition means, how it works, and how it’s implemented in various industries using the latest advances in artificial intelligence.
What is the the SetFit architecture and how does it outperform GPT-3 and other few shot large language models
What is image classification and how we build production level TensorFlow image classification systems for recognizing various products on a retail shelf.
Explore the application of intelligent document processing (IDP) in different industries and dive in-depth on intelligent document pipelines.
How to build an image classification model in PyTorch with a real world use case. How you can perform product recognition with image classification
Let's build a custom CTA generator that you'll actually want to use for your website copy
We’re going to look at how we built a state of the art NLP pipeline for blended summarization and NER to process master service agreements (MDAs) that vary the outputs based on the input document and what is deemed important information.
Get a comprehensive overview of a purchase order vs. invoice, including when businesses use each, what information goes in them, and more.
Learn what Google Shopping categories are used for and how you can automate fitting products to this taxonomy using ai.
Automatically categorize your Shopify store products to the Shopify Product Taxonomy instantly with ai based PIM software
Dive deep into 3-way invoice matching, including how it works, eight benefits for your business, and the problems with doing it manually.
Smart farming using computer vision and deep learning provides the most promising path forward in the slow-moving industry of agriculture.
How we leveraged large language models to build a legal clause rewriting pipeline that generates stronger language and more clarity in legal clauses
Using ai for document information extraction to automate various parts of the loan process.
Apply AI to your favorite sport with this guide. Learn how automated ball tracking can change the game for coaches and players.
Categorize your ecommerce products to the 2021 google product taxonomy tree instantly with our Ai software
Surveying the current landscape of ecommerce automation and how you can use ai to automate huge chunks of your product management.
Classify your product data against an existing product category database or generate categories and tags in seconds using artificial intelligence
Warehouse automation plays a crucial role across your supply chain. Learn about how machine learning and ai software can be integrated into your warehouse automation stack.
4 different NLP methods of summarizing longer input text into different methods such as extractive, abstractive, and blended summarization
iscover an invoice OCR tool that will revolutionize the way you handle invoices. There’s no human intervention needed & a dramatically lower per-invoice cost.
Instead of invoice matching taking upwards of a week, it could take mere seconds with the proper automation solution. Learn more here.
Manual and template-based invoicing are riddled with low accuracy and required human intervention. Learn how to systematically eliminate these issues with the right invoice data capture software.
A complete walkthrough guide on how to use visual search in ecommerce stores to create more sales and real examples of companies already using it.
Automating the extraction of data from invoices can reduce the stress of your accountants by finding inaccuracies, digitizing paper invoices, and more.
How you can optimize email marketing campaigns with machine learning based models that improve conversion & click-through rates.
How you can use machine learning based data matching to compare data features in a scalable architecture for deduping, record merging, and operational efficiency
Learn how lifetime value or LTV prediction can improve your marketing strategies. Then, discover the best statistical & machine learning models for your predictions.
A deep understanding of how we use gpt-3 and other NLP processes to build flexible chatbot architectures that can handle negotiation, multiple conversation turns, and multiple sales tactics to increase conversions.
The popular HR company O.C. Tanner, which has been in business since 1927 and has over 1500 employees, was looking to research and design two GPT-3 software products to be used as internal tools with their clients. GPT-3 based products can be difficult to outline and design given the sheer lack of publicly available information around optimizing and improving these systems to a production level.
We’ll compare Tableau vs QlikView in terms of popularity, integrations, ease of use, performance, security, customization, and more.
With a context-aware recommender system, you can plan ways to recreate some of the contextual conditions that persuade them to buy more from you.
We’re going to walk through building a production level twitter sentiment analysis classifier using GPT-3 with the popular tweet dataset Sentiment140.
Find out how machine learning in medical imaging is transforming the healthcare world and making it more efficient with three use cases.
Discover ways that machine learning in health care informatics has become indispensable. Review the results of two case studies and consider two key challenges.
Accelerate your growth by pivoting key areas of your business to AI. Your business outcomes will be achieved quicker & you’ll see benefits you didn’t plan for.
We built a GPT-3 based software solution to automate raw data processing and data classification. Our model handles keyword extraction, named entity recognition, text classification | Case Study
We built a custom GPT-3 pipeline for key topic extraction for an asset management company that can be used across the financial domain | Case Study
How you can use GPT-3 to create higher order product categorization and product tagging from your ecommerce listings, and how you can create a powerful product taxonomy system with ai.
5 ways you can use product matching software in ecommerce to create real value that raises your sales metrics and improves your workflow operations.
Data mining and machine learning in cybersecurity enable businesses to ensure an acceptable level of data security 24/7 in highly dynamic IT environments. Learn how data security is getting increasingly automated.
Product recognition software has tremendous potential to improve your profits and slash your costs in your retail business. Find out just how useful it is.
Big data has evolved from hype to a crucial part of scaling your organization in every modern industry. Learn more about how big data is transforming organizations and providing business impacts.
Learn how natural language processing can benefit everybody involved in education from individual students and teachers to entire universities and mass testing agencies.
Here’s how automated data capture systems can benefit your business in some key ways and some real-life examples of what it looks like in practice.
Use these power ai and machine learning tools to create business intelligence in your marketing that pushes your business understanding and analytics past your competition.
We built a custom ML pipeline to automate information extraction and fine tuned it for the legal document domain.
In this practical guide, you'll get to know the principles, architectures, and technologies used for building a data lake implementation.
Find out how machine learning in biology is accelerating research and innovation in the areas of cancer treatment, medical devices, and more.
An enterprise data warehouse (EDW) is a repository of big data for an enterprise. It’s almost exclusive to business and houses a very specific type of data.
Dlib is a versatile and well-diffused facial recognition library, with perhaps an ideal balance of resource usage, accuracy and latency, suited for real-time face recognition in mobile app development. It's becoming a common and possibly even essential library in the facial recognition landscape, and, even in the face of more recent contenders, is a strong candidate for your computer vision and facial recognition or detection framework.
Learn how to utilize machine learning to get a higher customer retention rate with this step-by-step guide to a churn prediction model.
Machine learning algorithms are helping the oil and gas industry cut costs and improve efficiency. We'll show you how.
We’ll show you the difference between machine learning vs. data mining so you know how to implement them in your organization.
Here’s why you should use deep learning algorithms in your business, along with some real-world examples to help you see the potential.
Beam search is an algorithm used in many NLP and speech recognition models as a final decision making layer to choose the best output given target variables like maximum probability or next output character.
Best Place For was looking for an image recognition based software solution that could be used to detect and identify different food dishes, drinks, and menu items in images sourced from blogs and Instagram. The images would be pulled from restaurant locations on Instagram and different menu items would be identified in the images. This software solution has to be able to handle high and low quality images and still perform at the highest production level, while accounting for runtime as well as accuracy.
Deep learning recommendation system architectures make use of multiple simpler approaches in order to remediate the shortcomings of any single approach to extracting, transforming and vectorizing a large corpus of data into a useful recommendation for an end user.
Let's take a look at the architecture used to build neural collaborative filtering algorithms for recommendation systems
GPT-3 is one of the most versatile and transformative components that you can include in your framework, application or service. However, sensational headlines have obscured its wide range of capabilities since its launch. Let’s take a look at the ways that companies and researchers are achieving real-world results with GPT-3, and examine the untapped potential of this 'celebrity AI'.
How to get started with machine learning based dynamic pricing algorithms for price optimization and revenue management
Let's take a look at how you can use spaCy, a state of the art natural language processing tool, to build custom software tools for your business that increase ROI and give you data insights your competitors wish they had.
The landscape for AI in ecommerce has changed a lot recently. Some of the most popular products and approaches have been compromised or undermined in a very short time by a new global impetus for privacy reform, and by the way that the COVID-19 pandemic has transformed the nature of retail.
Extremely High ROI Computer Vision Applications Examples Across Different Industries
Building Data Capture Services To Collect High ROI Business Data With Machine Learning and AI
Software packages and Inventory Data tools that you definitely need for all automated warehouse solutions
Inventory automation with computer vision - how to use computer vision in online retail to automate backend inventory processes