A Deep Guide to Text-Guided Open-Vocabulary Segmentation
Discover the power of text-guided open-vocabulary segmentation using large language models like GPT-4 & ChatGPT for automating image and video processing tasks.
Modern developments in artificial intelligence (AI), deep learning, and natural language processing (NLP), such as the advent of large language models like GPT-4 and ChatGPT, are extremely useful in the finance industry to increase revenues, automate processes, and streamline workflows. Understand more from this deep dive on the applications of NLP in finance.
NLP can be very useful for a variety of roles and tasks in the finance sector. A few examples include:
In the following sections, we help you understand, in-depth, how you can apply NLP in finance.
In financial services, analysts have to regularly read long-form financial reports running into hundreds of pages to gain insights. For example, each annual report of Silicon Valley Bank, a bank that went bust recently, contains about 200 pages of explanations and tables. Reading them page to page takes a lot of time and effort. Perhaps, if the information was more accessible and queryable, external analysts would have been able to notice an increased probability of bankruptcy and send out warnings.
For such long-form financial documents, you can use modern NLP models to summarize all the important information while retaining key details verbatim, all within a few seconds. This mix of abstractive and extractive summarization is called blended summarization and is very useful for any use case where key details must be kept unchanged.
The biggest limitation of the GPT models is their caps on the number of tokens. GPT-3 models were limited to about 4,000 sub-word tokens while GPT-4 models support 8,000-32,000 tokens. Any document that exceeds those limits must be broken into smaller pieces (known as chunks) and sent to GPT separately.
But doing so creates new problems like loss of context or repeating information which leads to inaccurate summaries. To solve these issues, we use a custom GPT-based pipeline (shown below) that offers clever solutions like chunking and prompt optimization.
Let's understand its special components and capabilities.
The chunking algorithm's job is to split long documents with minimal loss of context, while retaining all its fine-grained, specific context. That means, the pipeline should not split on clauses and paragraphs that supply important context to their neighborhoods. At the same time, retention of context must be optimized because the longer the input prompts get, the higher the risk of the model ignoring too much context and producing bad summaries.
So given a document, the pipeline decides the optimum balanced size for each chunk and, using custom logic, decides how much of context from the earlier chunk must be included. Next, it builds up a chunk-specific custom prompt for GPT that has been found to produce a better-quality summary.
The GPT models produce better results when we prefix the actual input chunk in the prompt with examples of input text and ideal summaries. This is called few-shot learning. Additionally, if the prompt examples are also made relevant to the input text, the generated summaries are of much better quality.
The task of pulling in the most relevant examples for each input text is called prompt optimization. For that, we maintain a database of gold reference prompt examples and summaries. Given an input text, we use a semantic similarity algorithm to pick the most relevant prompts and summaries and inject them in the prompt dynamically. This ensures that the few-shot examples in the prompt are highly relevant to the input text and summarization task.
Blended summarization is achieved by using the right dynamic prompts for each input chunk with the right prompt examples to show the GPT model that we want abstractive summaries for some sections and extractive for others. A database of prompts specific to finance or banking is maintained for this purpose.
An output algorithm recombines the summaries generated for each chunk such that they don't feel choppy to a reader but instead feel contiguous with good flow.
Finance, accounting, and banking professionals often need information from complex documents to make decisions on investments, loan approvals, trade reports, and other financial operations. Such information includes:
Let's see some practical applications.
Finance professionals may need to analyze Certificate of Incorporation (COI) documents to understand a company's legal structure, capitalization, and key business details for investment, evaluating initial public offerings, or due diligence prior to a merger or acquisition. Reviewing such documents can also help finance professionals to ensure compliance with laws and regulations.
The important information in such documents includes:
Such information can be identified and stored in a database using automated information extraction pipelines that accept digital formats or scans of these documents and accurately extract all the essential information using AI, computer vision, and NLP. We'll explain how it works in a later section.
Another common application is information extraction from invoices and purchase orders so that accounting professionals can either automate or optimize tasks like:
A third application is the automation of your business workflows, such as loan approvals. Such automated pipelines deliver enormous savings on manual labor and costs. They also inject some neutrality in the evaluation process to help prevent fraud and collusion.
Intelligent document processing pipelines enable your business to reliably automate the understanding of complex documents and extraction of information from them instead of using manual labor or unreliable semi-automated, template-based approaches. Regardless of the process you're trying to automate, these pipelines all work the same way with the following five stages.
Raw documents — with different formats like portable document format or image formats — are received, stored, and pre-processed to prepare them for the rest of the pipeline.
This is the primary stage where all the information extraction happens, with the following steps:
The extracted information is passed through rule engines that check the fields based on business policy rules. If it passes all the rules, it may be passed to more sophisticated statistical models. For example, a loan application may be sent to a loan default risk model or interest projection model to estimate its associated business value and risk.
The extracted information is stored in external systems like databases or ERP.
The extracted information is supplied to, or fetched by, other business workflows and acted upon. For example, based on loan default risk models or other business data analytics, loan applications are prioritized and sent for manual reviews and approvals by loan officers.
Regulatory compliance is one of the most challenging tasks in finance and banking. Non-compliance or mistakes in compliance carry risks like civil penalties, criminal prosecution, and damage to reputation. But compliance isn't simple. The difficulties include:
From a business perspective, compliance is often seen as a cost center with few benefits. So achieving full compliance with minimal cost and effort is a desired goal of all businesses. But its inherent and emergent complexities make it a difficult goal to achieve in practice.
The volume of documents being put out and constantly changed means that there is always niggling uncertainty over becoming inadvertently non-compliant simply due to ignorance about some minor change in some document. So compliance departments have no choice but to read them all.
AI and NLP can help with that and reduce compliance costs by reducing the time and labor expended on document reading and comprehension. In this section, we'll explain how NLP pipelines can automatically deduce whether a regulatory change is relevant to your business and notify compliance officers about the change in real time. Once notified, they can use question-answering chatbots to seek answers to complex questions about the regulations, a use case we cover later on.
The main question — "Does this clause of this regulation apply to my business?" — can be treated as a semantic similarity problem.
On one side is your business with various aspects like:
On the other side are the laws, regulations, and guidelines of all those jurisdictions with a complex set of conditions to decide who is regulated, when, and how.
So the semantic similarity problem is to take those complex regulatory conditions and match them against the aspects of your business. Most of these aspects will be in your ERP. A secondary semantic similarity problem is to detect whether a regulatory clause has changed between the last time it was fetched and now.
One thing GPT models are very good at is intelligently judging semantic similarity without requiring any hardcoded rules or thresholds like in traditional NLP. We can basically supply the text of the regulatory clauses along with the business aspect values from your ERP and ask GPT if anything matches. Every match is a regulation that's potentially relevant to your business. You can then dig deeper with the help of a question-answering chatbot (covered in the next section).
Similarly, GPT is used to detect changes between previous regulatory text and current text to find deltas. If there's a delta, task frameworks like LangChain can be integrated with the GPT model to notify relevant compliance officers in relevant jurisdictions.
In this section, we explain how GPT chatbots are used for helping employees understand complex financial documents.
Finance professionals often have to read long, dense documents like compliance regulations, annual reports, financial analyses, and investor reports. Likewise, banking professionals have to wade through documents like investment reports, compliance regulations, and corporate loan applications.
Reading them requires concentration and attention to detail because a lot of important information and answers are scattered over hundreds of pages. These professionals may have specific questions in mind, but the structures of those documents force them to expend a lot of brain cycles and time (and time is money) on reading and comprehension to find the answers.
Chatbots exponentially streamline this by easily digesting thousands of pages of documents in seconds and providing accurate, human language answers to complex questions about those documents.
Chatbots are not new, but users trust them only when they can give accurate and complete answers to complex questions every time. That's always been a sticking point with traditional chatbots, even those like Google's Dialogflow driven by older AI.
But chatbots backed by the awesome power of large language models like GPT-4 are a different breed altogether. Trained on vast quantities of real-world documents, they achieve unparalleled levels of semantic comprehension, abstraction, and accuracy.
Business benefits of using GPT chatbots include:
The example below shows one of our banking chatbots in action answering complex questions with natural language answers:
Frameworks like Google Dialogflow and Meta's BlenderBot are around to help create AI chatbots. But they have the following drawbacks compared to custom GPT chatbots:
A custom architecture to turn a GPT chatbot into one capable of understanding complex internal business documents and answering complex questions is shown below:
We'll walk you through the high-level steps that go into readying your information desk chatbot.
First, we collect all the useful documents like regulations, FAQs, and knowledge base articles that contain useful information for your employees. This information isn't specific to a user but something that's applicable to everyone, like in this example:
Potential questions and Informational facts are extracted from such content using manual annotation or web scraping. They go into a knowledge bank and provide the details and context required for the answers.
Prompts are key to getting the most out of GPT. When we build GPT chatbots, we must frame the relevant details and questions in particular ways for GPT to interpret them correctly. We can provide a few examples of ideal prompts and answers (few-shot learning) to GPT so that it can dynamically figure out what's expected of it based on the patterns in the examples.
We do this by maintaining a database of gold-standard prompts and answers. When a customer query is received, we dynamically select the most relevant examples from that database and prefix them to the customer's query before asking GPT. This helps GPT interpret the query correctly and return high-quality responses.
Placeholder variables are another important aspect of this phase. Instead of hardcoding details and links, we train GPT to output placeholders. These variables are replaced later with country-specific or department-specific information to provide personalized answers. For example, currencies and policy pages may be different in each country. So we ask GPT to generate placeholders for them instead of hardcoding a currency or page link.
In the example below, GPT outputs a placeholder for the link to a pricing page which will be replaced later in the pipeline:
In addition to few-shot examples and prompt optimization, another step that can potentially improve the quality of answers is fine-tuning the GPT model by supplying a dataset of questions and answers. Fine-tuning essentially creates a custom GPT model, stored on OpenAI's systems, that's available only to your company. It's a good approach if the nature of information, prompt syntax, and answer formats are very different and domain-specific compared to the standard text generated by GPT.
The essential idea here is that given a customer query, look up our database of extracted questions and answers, find the question that is most similar to the customer query, and return the associated answer for that extracted question.
To implement this idea of finding the most similar question, we convert all questions and queries to math forms called embeddings. They are essentially vectors that encode various linguistic and contextual information as numbers. Once converted to vectors, we can use math techniques like cosine similarity to find a question vector that is similar to a customer query vector. The answer associated with that question vector is then the most relevant answer to the customer's query as well.
For converting questions and queries to vectors, we use a model called Sentence-BERT from the sentence transformers library. It provides excellent results for such similarity tasks. We can further fine-tune it to achieve very high semantic relevance using our domain-specific question-answer datasets.
In the previous step, there are likely to be thousands or even millions of questions and answers. So a system that can store millions of vectors and calculate similarities quickly is necessary. Such systems are called vector databases, and Pinecone and FAISS are some popular options.
In this article, we showed how large language models can streamline a variety of tasks in finance and banking. While these models are already very capable, the ability to build custom models and pipelines from them boosts their capabilities even more. Here at Width.ai, we have years of expertise in customizing and fine-tuning large language models and other machine-learning algorithms for multiple industries.
Contact us to find out how we can help bring the power of large language models to your business!
Discover the power of text-guided open-vocabulary segmentation using large language models like GPT-4 & ChatGPT for automating image and video processing tasks.
Learn how CLIPSeg segmentation, in combination with GPT-4 and ChatGPT, can enable diverse applications from medical image diagnosis to remote sensing.
A deep dive into how we reached SOTA accuracy in product similarity matching through a custom fine-tuning pipeline that refines the CLIP model for image similarity.
Boost your conversions and sales numbers with NLP in sales using OpenAI's GPT-3 and GPT-4. You can use chatbots to improve customer experience and loyalty.
Explore the use of GPT for opinion summarization through innovative pipeline methods, evaluation metrics like ROUGE and BERTScore, and human evaluation insights. Dive into novel entailment-based evaluation tools for a comprehensive understanding of model performance in capturing diverse user opinions.
Come aboard the large language model revolution with our deep dive on AI21 vs. GPT-3 for business use cases like ad copy generation and math proof generation.
A technical guide to using BERT for extractive summarization on lectures that outperforms other NLP models
Discover how prompt based LLMs like GPT-3 & GPT-4 are transforming news summarization with its zero-shot capabilities and adaptability to specialized tasks like keyword-based summarization. Learn about the limitations of current evaluation metrics and the potential future directions in text summarization research.
Discover the PEZ method for learning hard prompts through optimization, a powerful technique that enhances generative models for image generation and language tasks, improves transferability, and enables few-shot learning
Take a look at how Width.ai built 17 generative ai pipelines for use in the Keap.com marketing copy generation product
A deep look at how recurrent feature reasoning outperforms other image inpainting methods for difficult use cases and popular datasets.
See a comparison of GPT-3 vs. GPT-J, a self-hosted, customizable, open-source transformer-based large language model you can use for your business workflows.
Discover how transformer networks are revolutionizing image and video segmentation, and get insights on modern semantic segmentation vs. instance segmentation.
Discover how the state-of-the-art mask-aware transformer produces visually stunning and semantically meaningful images and how it stacks up against Stable Diffusion & DALL-E for large-hole inpainting
Unlock the full potential of spaCy with this guide to building production-grade text classification pipelines for business data.
We compare 12 AI text summarization models through a series of tests to see how BART text summarization holds up against GPT-3, PEGASUS, and more.
Let’s take a look at what intent classification is in conversational ai and how you can build a GPT-3 intent classification model for conversational ai and chatbot pipelines.
Discover the capabilities of zero-shot object detection, which enables anyone to use a model out-of-the-box without any training and generate production-grade results.
What is facial expression recognition and what SOTA models are being used today in production
Get a simple TensorFlow facial recognition model up & running quickly with this tutorial aimed at using it in your personal spaces on smartphones & IoT devices.
Explore accurate classification algorithms using the latest innovations in deep learning, computer vision, and natural language processing.
Learn what human activity recognition means, how it works, and how it’s implemented in various industries using the latest advances in artificial intelligence.
What is the the SetFit architecture and how does it outperform GPT-3 and other few shot large language models
What is image classification and how we build production level TensorFlow image classification systems for recognizing various products on a retail shelf.
Explore the application of intelligent document processing (IDP) in different industries and dive in-depth on intelligent document pipelines.
How to build an image classification model in PyTorch with a real world use case. How you can perform product recognition with image classification
Let's build a custom CTA generator that you'll actually want to use for your website copy
We’re going to look at how we built a state of the art NLP pipeline for blended summarization and NER to process master service agreements (MDAs) that vary the outputs based on the input document and what is deemed important information.
Get a comprehensive overview of a purchase order vs. invoice, including when businesses use each, what information goes in them, and more.
Learn what Google Shopping categories are used for and how you can automate fitting products to this taxonomy using ai.
Automatically categorize your Shopify store products to the Shopify Product Taxonomy instantly with ai based PIM software
Dive deep into 3-way invoice matching, including how it works, eight benefits for your business, and the problems with doing it manually.
Smart farming using computer vision and deep learning provides the most promising path forward in the slow-moving industry of agriculture.
How we leveraged large language models to build a legal clause rewriting pipeline that generates stronger language and more clarity in legal clauses
Using ai for document information extraction to automate various parts of the loan process.
Apply AI to your favorite sport with this guide. Learn how automated ball tracking can change the game for coaches and players.
Categorize your ecommerce products to the 2021 google product taxonomy tree instantly with our Ai software
Surveying the current landscape of ecommerce automation and how you can use ai to automate huge chunks of your product management.
Classify your product data against an existing product category database or generate categories and tags in seconds using artificial intelligence
Warehouse automation plays a crucial role across your supply chain. Learn about how machine learning and ai software can be integrated into your warehouse automation stack.
4 different NLP methods of summarizing longer input text into different methods such as extractive, abstractive, and blended summarization
iscover an invoice OCR tool that will revolutionize the way you handle invoices. There’s no human intervention needed & a dramatically lower per-invoice cost.
Instead of invoice matching taking upwards of a week, it could take mere seconds with the proper automation solution. Learn more here.
Manual and template-based invoicing are riddled with low accuracy and required human intervention. Learn how to systematically eliminate these issues with the right invoice data capture software.
A complete walkthrough guide on how to use visual search in ecommerce stores to create more sales and real examples of companies already using it.
Automating the extraction of data from invoices can reduce the stress of your accountants by finding inaccuracies, digitizing paper invoices, and more.
How you can optimize email marketing campaigns with machine learning based models that improve conversion & click-through rates.
How you can use machine learning based data matching to compare data features in a scalable architecture for deduping, record merging, and operational efficiency
Learn how lifetime value or LTV prediction can improve your marketing strategies. Then, discover the best statistical & machine learning models for your predictions.
A deep understanding of how we use gpt-3 and other NLP processes to build flexible chatbot architectures that can handle negotiation, multiple conversation turns, and multiple sales tactics to increase conversions.
The popular HR company O.C. Tanner, which has been in business since 1927 and has over 1500 employees, was looking to research and design two GPT-3 software products to be used as internal tools with their clients. GPT-3 based products can be difficult to outline and design given the sheer lack of publicly available information around optimizing and improving these systems to a production level.
We’ll compare Tableau vs QlikView in terms of popularity, integrations, ease of use, performance, security, customization, and more.
With a context-aware recommender system, you can plan ways to recreate some of the contextual conditions that persuade them to buy more from you.
We’re going to walk through building a production level twitter sentiment analysis classifier using GPT-3 with the popular tweet dataset Sentiment140.
Find out how machine learning in medical imaging is transforming the healthcare world and making it more efficient with three use cases.
Discover ways that machine learning in health care informatics has become indispensable. Review the results of two case studies and consider two key challenges.
Accelerate your growth by pivoting key areas of your business to AI. Your business outcomes will be achieved quicker & you’ll see benefits you didn’t plan for.
We built a GPT-3 based software solution to automate raw data processing and data classification. Our model handles keyword extraction, named entity recognition, text classification | Case Study
We built a custom GPT-3 pipeline for key topic extraction for an asset management company that can be used across the financial domain | Case Study
How you can use GPT-3 to create higher order product categorization and product tagging from your ecommerce listings, and how you can create a powerful product taxonomy system with ai.
5 ways you can use product matching software in ecommerce to create real value that raises your sales metrics and improves your workflow operations.
Data mining and machine learning in cybersecurity enable businesses to ensure an acceptable level of data security 24/7 in highly dynamic IT environments. Learn how data security is getting increasingly automated.
Product recognition software has tremendous potential to improve your profits and slash your costs in your retail business. Find out just how useful it is.
Big data has evolved from hype to a crucial part of scaling your organization in every modern industry. Learn more about how big data is transforming organizations and providing business impacts.
Learn how natural language processing can benefit everybody involved in education from individual students and teachers to entire universities and mass testing agencies.
Here’s how automated data capture systems can benefit your business in some key ways and some real-life examples of what it looks like in practice.
Use these power ai and machine learning tools to create business intelligence in your marketing that pushes your business understanding and analytics past your competition.
We built a custom ML pipeline to automate information extraction and fine tuned it for the legal document domain.
In this practical guide, you'll get to know the principles, architectures, and technologies used for building a data lake implementation.
Find out how machine learning in biology is accelerating research and innovation in the areas of cancer treatment, medical devices, and more.
An enterprise data warehouse (EDW) is a repository of big data for an enterprise. It’s almost exclusive to business and houses a very specific type of data.
Dlib is a versatile and well-diffused facial recognition library, with perhaps an ideal balance of resource usage, accuracy and latency, suited for real-time face recognition in mobile app development. It's becoming a common and possibly even essential library in the facial recognition landscape, and, even in the face of more recent contenders, is a strong candidate for your computer vision and facial recognition or detection framework.
Learn how to utilize machine learning to get a higher customer retention rate with this step-by-step guide to a churn prediction model.
Machine learning algorithms are helping the oil and gas industry cut costs and improve efficiency. We'll show you how.
We’ll show you the difference between machine learning vs. data mining so you know how to implement them in your organization.
Here’s why you should use deep learning algorithms in your business, along with some real-world examples to help you see the potential.
Beam search is an algorithm used in many NLP and speech recognition models as a final decision making layer to choose the best output given target variables like maximum probability or next output character.
Best Place For was looking for an image recognition based software solution that could be used to detect and identify different food dishes, drinks, and menu items in images sourced from blogs and Instagram. The images would be pulled from restaurant locations on Instagram and different menu items would be identified in the images. This software solution has to be able to handle high and low quality images and still perform at the highest production level, while accounting for runtime as well as accuracy.
Deep learning recommendation system architectures make use of multiple simpler approaches in order to remediate the shortcomings of any single approach to extracting, transforming and vectorizing a large corpus of data into a useful recommendation for an end user.
Let's take a look at the architecture used to build neural collaborative filtering algorithms for recommendation systems
GPT-3 is one of the most versatile and transformative components that you can include in your framework, application or service. However, sensational headlines have obscured its wide range of capabilities since its launch. Let’s take a look at the ways that companies and researchers are achieving real-world results with GPT-3, and examine the untapped potential of this 'celebrity AI'.
How to get started with machine learning based dynamic pricing algorithms for price optimization and revenue management
Let's take a look at how you can use spaCy, a state of the art natural language processing tool, to build custom software tools for your business that increase ROI and give you data insights your competitors wish they had.
The landscape for AI in ecommerce has changed a lot recently. Some of the most popular products and approaches have been compromised or undermined in a very short time by a new global impetus for privacy reform, and by the way that the COVID-19 pandemic has transformed the nature of retail.
Extremely High ROI Computer Vision Applications Examples Across Different Industries
Building Data Capture Services To Collect High ROI Business Data With Machine Learning and AI
Software packages and Inventory Data tools that you definitely need for all automated warehouse solutions
Inventory automation with computer vision - how to use computer vision in online retail to automate backend inventory processes