Turbocharge Dialogflow Chatbots With LLMs and RAG
Are Dialogflow chatbots still relevant when LLM-based chatbots are available? Yes. We'll show you how you can combine them to get the best of both worlds.
Large language models like OpenAI's generative pretrained transformers (GPT-4) are finding new uses everyday in various industries. Due to their training on huge volumes of real-world text data, these models are highly capable at natural language processing (NLP) and text-generation tasks. Out of the box, they're able to both interpret and generate varieties of high-quality sales content. In this article, find out how to use GPT-based NLP in sales for increasing your conversions and customers.
How can GPT NLP techniques help people in sales roles? Here's a preview of the possibilities:
Solo entrepreneurs and small business owners can also improve their conversions using these GPT NLP techniques. In the next sections, we dive deeper into some use cases to help your sales professionals understand GPT's capabilities in depth.
Let's start by using GPT chatbots as shopping assistants for e-commerce use cases.
A great shopping experience is when your customers can find exactly what they have in mind. Most shops have a search box and search filters for the purpose. However, this typical search interface makes for some inconvenient and inefficient user experiences, as explained below.
Search filters give customers fine-grained control to drill down their choices. Just look at the number of filters Amazon shows for a customer looking for jeans:
But making decisions with so many checkboxes, sliders, color selectors, and "See more" links is just plain exhausting. Changing your selections or removing a filter means even more steps. The fine-grained control it gives is at the cost of convenience.
The filtered search experience on mobile devices is worse because of the small screen sizes and related problems like the fat finger syndrome which can lead to unintended selection of choices while tapping or scrolling.
During holiday gifting seasons, your customers are likely to buy a larger number of products than normal. They may have additional constraints on their purchases, such as the total price or the range of acceptable delivery dates.
Ideally, your user interface should let your customers specify such concerns too. However, the very nature of user interfaces makes it difficult to implement such special considerations.
Such inconveniences increase search abandonment — your customers simply stop searching and close your website out of frustration. According to a recent survey commissioned by Google Cloud, $2 trillion is lost to search abandonment and 82% of customers go on to avoid websites where they've abandoned searches.
You can easily overcome all these drawbacks by using GPT-3/GPT-4 chatbots. They enable you to simulate a shopping assistant that a customer can talk to naturally and work through complex or abstract shopping ideas.
The chatbot conducts a human-like conversation with your customer. It asks guiding questions to find out what the customer has in mind and progressively shortlists products based on the answers. With the patience and attentiveness of a truly human assistant, the chatbot obeys all the criteria a customer gives, no matter how trivial.
Your customers can chat with it either by typing or by talking into their mobile device. For the latter, text-to-speech services, like OpenAI's Whisper, convert customer speech to text for better experiences on mobile devices, where typing lots of text can be clumsy and error-prone.
In the example below, a GPT shopping chatbot guides a customer looking for jeans. The chatbot's questions are in blue, and the customer's replies are in black.
Notice that the customer is able to quickly specify product-level and cart-level criteria for multiple items. Later, the chatbot increases sales beyond their planned budget.
The image below shows another of our shopping chatbots in action:
The customer asks the bot for suggestions to go with these jeans. The bot provides a helpful answer in natural language just like a human assistant.
The chatbot example above shows some remarkable capabilities like:
All this is possible thanks to our fine-tuned GPT-3/GPT-4 models and our pipeline to integrate downstream data into GPT output. The pipeline architecture looks like this:
Let's take a closer look at what each component does.
This component manages the overall conversation. It examines your message or answer and decides whether the query requires some dynamic information from downstream. If not, it routes the query to the non-dynamic GPT model. Otherwise, it routes the query to the subsystem that handles dynamic information like product details, availability, and prices.
The same model also handles queries for aspects that don't change often, such as general information about product categories or payment options, and generates responses that don't require any downstream information.
This model handles user queries that require dynamic information or downstream APIs by building suitable answers and including relevant dynamic information from your inventory database. Finding the most relevant information for a given query is done with help from the next two components. The idea here is to be able to answer queries such as “how many red shirts do you have in stock?” or “do you sell cotton Dior sweatshirts” with a real-time answer based on your current inventory.
When the chatbot gets a query, the Q&A GPT model must find the products whose details and other real-time context — like prices and discount campaigns — match. These details aren't part of the GPT model and shouldn't be either, due to their dynamic nature. Instead, we inject all these details dynamically in the prompt as context and few-shot examples.
We have empirically observed that if the context and few-shot examples are semantically relevant to the query to a high degree, the quality of GPT's answers also improve drastically. To fetch only the most semantically relevant details, we use a Sentence BERT (SBERT) model. It encodes all these product details and other real-time context documents as embeddings and stores them in a production-grade vector database. A typical shopping site can produce millions to billions of embeddings for its inventory.
When the query is received, we use SBERT to calculate the query embedding. The vector database then uses cosine similarity to find the stored embeddings that are most semantically relevant to the query embedding. These matching details and real-time context are inserted into the prompt before the user query and sent to the GPT model.
The embeddings are stored in a production-grade vector database like Pinecone. Its role is to match the embeddings for the query and the products to find the product details and real-time context that are most semantically similar to the query. This database is an index of the current data in the AWS RDS instance.
Customers find that the conversational interfaces of shopping assistant chatbots are far more convenient. They help reduce buyer frustration, choice overload, search abandonment, and cart abandonment.
They can also increase conversions and convince customers to spend more by suggesting better choices in the vicinity of the customers' budgets.
In the sectors of banking, insurance, healthcare, and government, information for existing and potential customers can be complex. People approach such services with specific information needs in mind. To help them, companies publish content like frequently asked questions (FAQs) and knowledge base articles.
However, that content may not always provide direct answers or may use different phrasing than what people ask. Plus, many people won't read through the provided content to find answers. Moreover, many questions may involve complex banking processes that don't have easy answers and require details from multiple sources to figure out. For example, a customer who wants a suspicious credit card transaction looked at and removed is a problem that's not easy to figure out by themselves. Information desks where a customer can speak with a human assistant are useful, but they are not feasible in all locations or at all times. Unanswered questions may result in lost conversions and sales. To avoid that, companies can use question-answering (QA) pipelines in chatbots that are available 24/7 and trained to find the most relevant answers in the content.
While custom deep learning models for QA are available, they require a lot of training data to achieve high quality. In contrast, GPT is already pretrained on reams of real-world data, making it a much more capable repository of information for multiple domains. Minimal fine-tuning on your company-specific content is sufficient to get high quality answers from GPT.
In the example below, a GPT chatbot for a bank answers a potential customer's specific information need accurately and succinctly, saving the customer from wading through pages of content:
The pipeline for an information desk chatbot is the same as the one above for shopping chatbots but the data is different. We'll walk you through the high-level steps that go into readying your information desk chatbot.
First, we collect all the useful static content like your FAQs, knowledge base, and website pages that contain general information relevant to your customers. This information isn't specific to a user or their account, but something that's applicable to everyone, like in this example:
Potential questions and informational facts are extracted from such content using manual annotation or web scraping, and stored in a knowledge base database for answering queries.
Prompts are key to getting the most out of GPT. When we build GPT chatbots, we must frame the relevant details and questions in particular ways for GPT to interpret them correctly. We must provide a few examples of ideal prompts and answers (few-shot learning) to GPT so that it can dynamically figure out what's expected of it based on the patterns in the examples.
We do this by maintaining a database of gold-standard prompts and answers. When a customer query is received, we dynamically choose the most relevant examples from that database and prefix it to the customer's query before asking GPT. This ensures that GPT interprets the query correctly and returns the expected response.
Another important aspect in this phase is that instead of hardcoding details and links, we train GPT to output placeholder variables. These variables are replaced later with customer-specific information to provide personalized answers. For example, currencies and policy pages may be different in each country. So we ask GPT to generate placeholders for them instead of hardcoding a currency or page link.
In the example below, GPT outputs a placeholder for the link to a pricing page which will be replaced later in the pipeline:
In addition to few-shot examples and prompt optimization, another step that can potentially improve the quality of answers is fine-tuning the GPT model by supplying a dataset of questions and answers. Fine-tuning essentially creates a custom GPT model, stored on OpenAI's systems, that's available only to your company. It's a good approach if the nature of information, prompt syntax, and answer formats are very different and domain-specific compared to the standard text generated by GPT.
The essential idea here is that given a customer query, look up our database of extracted questions and answers, find the question that is most similar to the customer query, and return the associated answer for that extracted question.
To implement this idea of finding the most similar question, we convert all questions and queries to math forms called embeddings. They are essentially vectors that encode various linguistic and contextual information as numbers. Once converted to vectors, we can use math techniques like cosine similarity to find a question vector that is similar to a customer query vector. The answer associated with that question vector is then the most relevant answer to the customer's query as well.
For converting questions and queries to vectors, we use a model called Sentence-BERT from the sentence transformers library. It provides excellent results for such similarity tasks. This architecture is also fine-tunable with real similarity pairs which allows us to boost the accuracy for specific use cases.
In the previous step, there are likely to be thousands or even millions of questions and answers. So a system that can store millions of vectors and calculate similarities quickly is necessary. Such systems are called vector databases, and Pinecone and FAISS are some popular options.
In you're in an information-intensive industry, these chatbots offer multiple business benefits over traditional information desks:
Let's shift our focus now from customers to sales professionals. How can our GPT systems can help your sales and business development professionals improve their outcomes and productivity?
Instead of sending generic emails to your leads, your outreach will get better responses if you can personalize them for each lead. Beyond greeting your leads by name, GPT can personalize the email's communication style to suit the lead's personality. Sources of personalized information include their personal and company LinkedIn profiles, product and service offerings on their websites, their financial reports, and other information.
Conceptually, the GPT pipeline here is the same as the one above for information desk chatbots. However, the content here comes from the sources mentioned above and its output is a personalized email message generated by GPT like in the example below:
In the example above, the GPT pipeline examines a prospect's online store, finds specific similarities in products, and sends a personalized email about them. The example also serves as a learning template. When given a list of hundreds of prospects and their websites, the pipeline can spit out hundreds of similar emails in seconds while automatically customizing details like "living room collection" and "furniture store" with products and success stories suited to each business.
You can use similar GPT pipelines in every step of your sales process:
The pipeline consists of two stages: transcription and summarization.
In the transcription stage, a transcription service like OpenAI Whisper converts your call's audio to a high-quality, low-error text conversation with features like speaker name recognition, automatic speaker labeling, multilingual speech recognition, filler sound removal, and code switching support (i.e., transcribing even if speakers switch between different languages in the same sentence).
In the summarization stage, GPT takes the call transcript and summarizes it. It automatically identifies key points and keeps them in the summary. Other sentences, like pleasantries, are discarded. We explore some important aspects of summarization implementation in more depth below.
Sales calls can go on for hours and feature multiple speakers, resulting in transcripts with hundreds of thousands of words. That's a problem because for all their great communication skills, GPT-3 had a prompt limit of around 4,000 tokens, while the GPT-4's prompt can range from 8,000-32,000 tokens. In fact, these are not full word limits, but subword limits, which means the actual word limits are about half of these. How can we summarize long call transcripts under such limitations?
For that, we use custom chunking algorithms to break up the transcript into smaller pieces and process them individually using GPT. To not lose any essential context, chunking must be done carefully, using techniques like topic extraction on each section and prefacing the chunk with the extracted topic.
But key points and action items aren't the only details you can obtain. In the next section, you'll see how you can get deeper information about your calls.
In addition to summaries and action items, GPT pipelines enable you to dig out deeper information from your sales calls like:
For example, from this example business call transcript, GPT can extract the following information.
It identifies the speakers named in the transcript:
It identifies tasks and assignees in the transcript:
It can report the percentage spoken by each speaker:
It can classify the sentiment of each sentence. Salespeople can then focus on the parts that were perceived negatively:
Large language models like GPT-3, GPT-4, GPT-J, and LlaMA are paradigm shifts in language processing capabilities. The potential benefits for sales roles in productivity and metrics are massive. Learn how you can unlock their awesome powers for your specific sales strategies. Contact us today.