Building Production-Grade spaCy Text Classification Pipelines for Business Data
Unlock the full potential of spaCy with this guide to building production-grade text classification pipelines for business data.
Implementing lifetime value (LTV) and LTV prediction using machine learning allows you to build out powerful predictive data models that lead to serious long term gains in revenue. A study from Bain & Company showed that just a 5% increase in retention rate could lead to over a 25% increase in profits long term. Building ltv based predictive models could be as simple as taking data you already have and plugging it into one of our popular algorithms built for any business model with customers.
In this article, we'll explain what LTV and LTV prediction are, how they’ll benefit your company, and how to implement them.
Lifetime value — also called LTV, customer lifetime value, CLV, or CLTV — is the total value that a customer brings to your business throughout their engagement with you. The most tangible and direct value a customer brings is the revenue they give your business when they purchase your products. However, they may also add intangible value by bringing you more customers through word-of-mouth or online referrals, investing in your business, or providing you with services that you need, for example.
Despite the name, it’s common for businesses to restrict LTV to a specific time period, rather than observing the entire customer lifetime, depending on data availability or business goals like short-term revenue forecasts.
The LTV of a customer, up to this point, can be calculated based on their history, like what products they purchased, for how much, and when. It's easy to calculate from historical sales data. But LTV prediction goes a step further — it tells you how much a customer is likely to spend at your business in the future. Predicting the future value of your customers helps shape your business strategies to give you a competitive edge, as we shall see next.
There are many approaches to customer lifetime value calculation or predicting customer lifetime value.
Calculating LTV requires historical purchase amounts and dates. These are the most common historical LTV models:
If you don’t have historical data, you can still use the LTV prediction for new customers (described later in this article) to estimate LTV based on demographic data while gathering sufficient data.
Predicting future LTV can involve two types of models:
Prior to the advent of using a machine learning model, LTV prediction used to be modeled using statistical-theoretical distributions and assumptions. These probabilistic models are still widely used in data science and business.
Techniques like Pareto/NBD and Pareto/GGG model the purchase process, lifetime, and customer behavior using probability distributions like Poisson, exponential, and gamma distributions. However, they aren't good at modeling the complex phenomena underlying user behavior and purchasing decisions
If your e-commerce business is new to LTV prediction, we suggest starting with statistical models because of their simplicity. The only data you need is purchase amounts and dates. These models calculate the recency, frequency, and monetary value tables from that data. They can provide you with the baseline predictions you need for an initial set of marketing decisions. For all our e-commerce clients, we always use these models to make baseline predictions before using machine learning.
Statistical models run fast, provide reasonably accurate results when used correctly, and are available through well-tested software like Lifetimes (part of the Python library). It’s even possible to run them in Excel. If these models show promise, you can upgrade to machine learning techniques.
Machine learning algorithms like random forests and neural networks have been used for LTV prediction. Their advantage over statistical models is their ability to make use of all your customer data to empirically model the complexities underlying LTV. For example, they can include factors like demographics, purchase behavior, and changes in user behavior to refine the prediction model.
One such experiment by an eyewear retailer compared a Pareto/GGG statistical model with a fully-connected deep neural network to predict LTV from their sales data. The deep neural network has five dense layers with rectified linear units (ReLU).
They found that their deep neural network scored 94.6% accuracy while Pareto/GGG scored 88.6%. Both accuracies are pretty impressive and support what we suggested above — start with statistical models to get a baseline, test the waters with your business decisions, see the outcomes, and then move up to machine learning for more accurate predictions.
By now, you must have realized that using a machine learning model like random forests and neural networks are quite popular in modern customer lifetime value calculation & prediction. A technique that doesn’t seem to belong here is representation learning. Yet, Chamberlain, et al., used exactly that to improve LTV predictions for an online fashion retail use case. Let’s explore the novel approach they describe in Customer Lifetime Value Prediction Using Embeddings.
The sequence of products that each customer viewed can be very useful for LTV prediction. High-value customers tend to look at products of higher value, which are often products that are less popular because of the high price. Lower-value customers tend to look at low-priced products and congregate during sales periods. If these behaviors can be extracted as features, they can be used for LTV prediction.
The product sequences can be derived from the browsing logs of website and mobile user sessions. However, when you have 85,000 items in your product catalog and 12.5 million customers, there’s a combinatorial explosion of possible sequences that makes it intractable to model them as handcrafted features.
Instead, these high-dimensional sequences can be replaced by low-dimensional embedding vectors that represent each customer’s product viewing behavior concisely. If two customers are close to each other in the embedding space, they viewed similar products at around the same time. On this model, a high-value customer who viewed high-value products during a sale would be far from a low-value customer who viewed low-value products during the same sale.
Normally, an embedding is generated by a neural network with an input layer, an equally long output layer, and one hidden layer consisting of a relatively small number of neurons. However, 12.5 million customers mean 12.5 million softmax output neurons have to activate for each input pair, making training a prohibitively expensive exercise.
To make it more practical, these researchers used a skip-gram with negative sampling (SGNS) model. SGNS evaluates only a small set of customers at each training step. SGNS can be understood better by considering how it works in word2vec of natural language processing. There, a sliding window over word sequence (of, say, 5 words) is called the context. Given one word as input, the skip-gram learns to output all the other words in its context learned from the corpus.
Similarly, a customer context is defined as a subset of all the customers who viewed the same products at around the same time. Given one customer at the center of the context as input, the SGNS model learns to output the other customers in that context. The authors found experimentally that a context of 11 customers worked well.
The baseline predicted LTV model is a regression random forest. It’s trained on several handcrafted features extracted from customer demographics, purchase history, product returns, and product information. The training data is the true LTV values calculated using LTV formulas on previous year’s data. They use feature data from the past 12 months because retail has strong seasonality effects that can be problematic if features are aggregated over longer periods. They define LTV itself as sales minus returns over just the previous year, not really over a lifetime.
The enhanced second regression random forest is the same as the baseline model, but it adds the customer behavior embeddings as another feature.
The third model is a separate random forest trained for churn classification on the same set of handcrafted features and customer behavior embedding features. It’s not directly related to LTV prediction, but they consider a customer churned if they have zero purchases over the previous year — that is basically zero LTV by this project’s definition of LTV. So LTV and churn predictions together help them identify both high-value and low-value customers to make business decisions about marketing spend.
The baseline LTV model showed a good fit between actual and predicted LTV. The Spearman rank-order correlation coefficient between actual and predicted was 0.46, indicating that the ranked orders of actual LTV and predicted LTV show a fairly strong positive relationship.
The churn classification machine learning model — effectively a binary classifier for zero LTV — that uses behavior embeddings as a feature, demonstrated significant uplift in AUC with optimum embedding lengths ranging from 32 to 128. Since the enhanced LTV model and the churn model are both trained on the same features, we can conclude that they are better than the baseline models without embeddings.
Customer behavior embeddings are an innovative approach to including high-dimensional, highly dynamic data like browsing, clickstream, or eye-tracking data. By defining an embedding context that’s suited to your sales and seasonality patterns, you can effectively segment your customers in the embedding space and train explainable models on highly dynamic data, a normally tricky task. These machine learning models will be much more sensitive to customer behavior patterns that regular models, trained only on handcrafted features, may miss. They can help you:
All the LTV prediction models you have seen so far have a common problem — they work well only when historical purchase data is available for a customer, but their predictions for first purchases and new customers won’t be as accurate. At first glance, this seems like an impossible problem to solve — how do you predict LTV from just one data point? Wang, et al., proposed a better approach for this problem in their paper, A Deep Probabilistic Model for Customer Lifetime Value Prediction.
All LTV prediction models are regression models that predict a continuous variable. The mean squared error (MSE) is the most commonly used loss metric by these models. This setup allows us to create prediction models similar to the architecture of other regression based product such as future real estate price prediction.
But lots of customers turn out to be one-time purchasers who never return. It’s a common phenomenon that all e-commerce businesses face. The LTV labels for such cases will be zero but zero denotes the absence of an LTV rather than a valid LTV. Unfortunately, MSE does not differentiate between zero as a status and zero as a value, which affects prediction accuracy when there are too many zero labels.
Another problem is that LTV distributions are highly skewed. Often, a small number of high-value, high-spending outlier customers account for most of the total customer spend. MSE’s squaring exaggerates the impact of these outliers, biasing the models to heavily penalize the predicted LTVs of high-value customers.
To solve these problems of MSE in the LTV prediction problem, the paper proposes a new loss function called zero-Inflated lognormal (ZILN). ZILN treats LTV prediction as two problems:
ZILN loss addresses both by combining a classification loss term for zero LTVs and a regression term for LTV by modeling it as a lognormal distribution that is naturally suited to heavy-tailed phenomena.
The deep neural network used with ZILN loss is a fully-connected neural network with an input layer, two hidden layers, a special logits layer, and an output regression neuron.
The deep neural network’s last layer is a special one with three units:
These are necessary because ZILN loss depends on these three variables.
Its middle layers are two standard hidden layers of 64 and 32 units. They learn a shared representation for two tasks: classification of returning customers and prediction of returning customer’s spend.
The machine learning model was run on Kaggle’s acquire valued shoppers challenge dataset. The model’s ability to classify customers as returning or not was measured using the Gini coefficient. ZILN loss outperformed MSE loss with 11.4% relative improvement.
The model’s ability to predict returning customer spending was judged using a decile-level, mean absolute percentage error (MAPE). ZILN loss showed much lower decile-level MAPE than MSE loss — 68.9% lower.
Is your new customer a potential future high-value customer? Should you direct your marketing spend towards getting them to increase their purchase frequency or purchase value at your business? Are they a churn risk? These are questions that a modern machine learning model like ZILN can help you answer accurately. Instead of a scattershot marketing strategy where you’re operating blind, you can let your data guide you to the most optimum ROI path.
The model explanations above will help you implement LTV prediction in your business. If you aren’t looking into customer lifetime value prediction, you are missing out on a substantial number of potential business benefits. Professor Peter Fader, a pioneer of CLV, used it to accurately forecast the revenues of Wayfair, a furniture e-commerce giant, in the coming decades. These are just a few of the most useful benefits of LTV prediction.
Custom lifetime value can tell you who your most valuable customers were and currently are. They’re the ones bringing most of the revenue to your business. But people change — their interests and responsibilities may change or their income levels may change. Your top spenders today may not be your top spenders in six months. If you have products in many categories, you’ll have different valuable customers in each category.
You can’t manually anticipate and plan for such a wide variety of future possibilities. So you may just ignore all complexity, simplify your calculations by aggregating historical purchases, pray that those customers keep buying at the same rate, and hope for the best.
But there’s a better technique for processing all those possibilities. LTV prediction combines historical data with observed changes in customer behavior, customer status, or session and engagement data to actually predict the future lifetime value of an individual customer or a cohort of customers. It’s like a time machine that can tell you how much your customers are likely to spend in the future over a certain time frame.
Businesses often segment customers based on demographics like age, gender, income levels, and so on to market and sell to them more efficiently. But when you segment this way, customer revenue stays in the background as an implicit goal while demographics are the visible factors that you hope will help you meet that goal.
If you can directly predict the future revenue of each customer instead of relying on hope, then revenue itself becomes a reliable basis for segmenting your customers. You can directly segment your customers into high, medium, and low revenue groups or decile-based groups. You can tailor your marketing and selling to these segments instead of relying on indirect factors like demographics.
Effective marketing involves many factors. Which marketing channel will be the most effective for a customer? How much revenue can you attribute to each marketing channel? What branding strategies will work best? How much money should you spend on each channel? What’s the cost per ad (CPA) and return on ad spend (ROAS)? What is my mobile marketing profitability long term?
These questions become easier to answer if you know what the future revenue from that customer will be. Without that, calculating future returns on your marketing investments is impossible. LTV prediction helps with marketing optimization by helping you understand likely future returns from an individual customer or a customer segmentation group. These same machine learning models allow you to build powerful mobile marketing prediction models that help you understand the differences between your ads on desktop and mobile in terms of average revenue, customer churn, net profit, and total revenue generated.
These improvements to your marketing strategy will help you increase your sales and profits, and higher sales and profits tend to invite investment, merger, acquisition, or exit opportunities.
When you have an idea about potential future revenue from each customer, you can tune your loyalty programs, offers, and discounts to those amounts, which will help you optimize your sales while keeping your spending tight.
Lifetime value prediction can also increase your retention rates. Retaining existing customers is often less expensive than new user acquisition. And when you fine tune your marketing offers, each customer feels special and sends more business your way.
Surveys say acquiring a new customer is anywhere from five to 25 times more expensive than retaining an existing one. Lifetime value predictions help you target only those potential customers whose lifetime value makes it worth the user acquisition cost. LTV predictions can reduce your acquisition cost for new users and optimize your return on investment (ROI).
When you know what your customers are likely to spend in the future and you know what your customers prefer to purchase, you essentially have a window into future sales. You can procure the most in-demand products and ensure their availability to your highest-value customers.
Predicting your customers’ lifetime value effectively helps you understand your average revenue down the road and what changes you can make to improve metrics such as user acquisition and ad revenue generated.
Predicting your customers’ lifetime value and using it to assess how strong your mobile marketing campaign is effectively helps you understand your average revenue from ads and what changes you can make to improve metrics such as user acquisition and ad revenue generated.
We’re often surprised that LTV prediction isn’t used more in e-commerce or the wider business world. Most businesses already have the relevant data and industry insights to build powerful long term value models that can predict actual outcomes.
Terms like prediction and predictive modeling may make people hesitant because they sound too complex. In reality, LTV prediction techniques can range from simple to complex. We at Width.ai have an experienced data science team that understands exactly the workflow that works best for you and your data. We can build custom applications to help you make use of these invaluable techniques that can grow your business exponentially. Contact us today.
Unlock the full potential of spaCy with this guide to building production-grade text classification pipelines for business data.
We compare 12 AI text summarization models through a series of tests to see how BART text summarization holds up against GPT-3, PEGASUS, and more.
Let’s take a look at what intent classification is in conversational ai and how you can build a GPT-3 intent classification model for conversational ai and chatbot pipelines.
Discover the capabilities of zero-shot object detection, which enables anyone to use a model out-of-the-box without any training and generate production-grade results.
What is facial expression recognition and what SOTA models are being used today in production
Get a simple TensorFlow facial recognition model up & running quickly with this tutorial aimed at using it in your personal spaces on smartphones & IoT devices.
Explore accurate classification algorithms using the latest innovations in deep learning, computer vision, and natural language processing.
Learn what human activity recognition means, how it works, and how it’s implemented in various industries using the latest advances in artificial intelligence.
What is the the SetFit architecture and how does it outperform GPT-3 and other few shot large language models
What is image classification and how we build production level TensorFlow image classification systems for recognizing various products on a retail shelf.
Explore the application of intelligent document processing (IDP) in different industries and dive in-depth on intelligent document pipelines.
How to build an image classification model in PyTorch with a real world use case. How you can perform product recognition with image classification
Let's build a custom CTA generator that you'll actually want to use for your website copy
We’re going to look at how we built a state of the art NLP pipeline for blended summarization and NER to process master service agreements (MDAs) that vary the outputs based on the input document and what is deemed important information.
Get a comprehensive overview of a purchase order vs. invoice, including when businesses use each, what information goes in them, and more.
Learn what Google Shopping categories are used for and how you can automate fitting products to this taxonomy using ai.
Automatically categorize your Shopify store products to the Shopify Product Taxonomy instantly with ai based PIM software
Dive deep into 3-way invoice matching, including how it works, eight benefits for your business, and the problems with doing it manually.
Smart farming using computer vision and deep learning provides the most promising path forward in the slow-moving industry of agriculture.
How we leveraged large language models to build a legal clause rewriting pipeline that generates stronger language and more clarity in legal clauses
Using ai for document information extraction to automate various parts of the loan process.
Apply AI to your favorite sport with this guide. Learn how automated ball tracking can change the game for coaches and players.
Categorize your ecommerce products to the 2021 google product taxonomy tree instantly with our Ai software
Surveying the current landscape of ecommerce automation and how you can use ai to automate huge chunks of your product management.
Classify your product data against an existing product category database or generate categories and tags in seconds using artificial intelligence
Warehouse automation plays a crucial role across your supply chain. Learn about how machine learning and ai software can be integrated into your warehouse automation stack.
4 different NLP methods of summarizing longer input text into different methods such as extractive, abstractive, and blended summarization
iscover an invoice OCR tool that will revolutionize the way you handle invoices. There’s no human intervention needed & a dramatically lower per-invoice cost.
Instead of invoice matching taking upwards of a week, it could take mere seconds with the proper automation solution. Learn more here.
Manual and template-based invoicing are riddled with low accuracy and required human intervention. Learn how to systematically eliminate these issues with the right invoice data capture software.
A complete walkthrough guide on how to use visual search in ecommerce stores to create more sales and real examples of companies already using it.
Automating the extraction of data from invoices can reduce the stress of your accountants by finding inaccuracies, digitizing paper invoices, and more.
How you can use machine learning based data matching to compare data features in a scalable architecture for deduping, record merging, and operational efficiency
A deep understanding of how we use gpt-3 and other NLP processes to build flexible chatbot architectures that can handle negotiation, multiple conversation turns, and multiple sales tactics to increase conversions.
The popular HR company O.C. Tanner, which has been in business since 1927 and has over 1500 employees, was looking to research and design two GPT-3 software products to be used as internal tools with their clients. GPT-3 based products can be difficult to outline and design given the sheer lack of publicly available information around optimizing and improving these systems to a production level.
We’ll compare Tableau vs QlikView in terms of popularity, integrations, ease of use, performance, security, customization, and more.
With a context-aware recommender system, you can plan ways to recreate some of the contextual conditions that persuade them to buy more from you.
We’re going to walk through building a production level twitter sentiment analysis classifier using GPT-3 with the popular tweet dataset Sentiment140.
Find out how machine learning in medical imaging is transforming the healthcare world and making it more efficient with three use cases.
Discover ways that machine learning in health care informatics has become indispensable. Review the results of two case studies and consider two key challenges.
Accelerate your growth by pivoting key areas of your business to AI. Your business outcomes will be achieved quicker & you’ll see benefits you didn’t plan for.
We built a GPT-3 based software solution to automate raw data processing and data classification. Our model handles keyword extraction, named entity recognition, text classification | Case Study
We built a custom GPT-3 pipeline for key topic extraction for an asset management company that can be used across the financial domain | Case Study
How you can use GPT-3 to create higher order product categorization and product tagging from your ecommerce listings, and how you can create a powerful product taxonomy system with ai.
5 ways you can use product matching software in ecommerce to create real value that raises your sales metrics and improves your workflow operations.
Data mining and machine learning in cybersecurity enable businesses to ensure an acceptable level of data security 24/7 in highly dynamic IT environments. Learn how data security is getting increasingly automated.
Product recognition software has tremendous potential to improve your profits and slash your costs in your retail business. Find out just how useful it is.
Big data has evolved from hype to a crucial part of scaling your organization in every modern industry. Learn more about how big data is transforming organizations and providing business impacts.
Learn how natural language processing can benefit everybody involved in education from individual students and teachers to entire universities and mass testing agencies.
Here’s how automated data capture systems can benefit your business in some key ways and some real-life examples of what it looks like in practice.
Use these power ai and machine learning tools to create business intelligence in your marketing that pushes your business understanding and analytics past your competition.
We built a custom ML pipeline to automate information extraction and fine tuned it for the legal document domain.
In this practical guide, you'll get to know the principles, architectures, and technologies used for building a data lake implementation.
Find out how machine learning in biology is accelerating research and innovation in the areas of cancer treatment, medical devices, and more.
An enterprise data warehouse (EDW) is a repository of big data for an enterprise. It’s almost exclusive to business and houses a very specific type of data.
Dlib is a versatile and well-diffused facial recognition library, with perhaps an ideal balance of resource usage, accuracy and latency, suited for real-time face recognition in mobile app development. It's becoming a common and possibly even essential library in the facial recognition landscape, and, even in the face of more recent contenders, is a strong candidate for your computer vision and facial recognition or detection framework.
Learn how to utilize machine learning to get a higher customer retention rate with this step-by-step guide to a churn prediction model.
Machine learning algorithms are helping the oil and gas industry cut costs and improve efficiency. We'll show you how.
We’ll show you the difference between machine learning vs. data mining so you know how to implement them in your organization.
Here’s why you should use deep learning algorithms in your business, along with some real-world examples to help you see the potential.
Beam search is an algorithm used in many NLP and speech recognition models as a final decision making layer to choose the best output given target variables like maximum probability or next output character.
Best Place For was looking for an image recognition based software solution that could be used to detect and identify different food dishes, drinks, and menu items in images sourced from blogs and Instagram. The images would be pulled from restaurant locations on Instagram and different menu items would be identified in the images. This software solution has to be able to handle high and low quality images and still perform at the highest production level, while accounting for runtime as well as accuracy.
Deep learning recommendation system architectures make use of multiple simpler approaches in order to remediate the shortcomings of any single approach to extracting, transforming and vectorizing a large corpus of data into a useful recommendation for an end user.
Let's take a look at the architecture used to build neural collaborative filtering algorithms for recommendation systems
GPT-3 is one of the most versatile and transformative components that you can include in your framework, application or service. However, sensational headlines have obscured its wide range of capabilities since its launch. Let’s take a look at the ways that companies and researchers are achieving real-world results with GPT-3, and examine the untapped potential of this 'celebrity AI'.
How to get started with machine learning based dynamic pricing algorithms for price optimization and revenue management
Let's take a look at how you can use spaCy, a state of the art natural language processing tool, to build custom software tools for your business that increase ROI and give you data insights your competitors wish they had.
The landscape for AI in ecommerce has changed a lot recently. Some of the most popular products and approaches have been compromised or undermined in a very short time by a new global impetus for privacy reform, and by the way that the COVID-19 pandemic has transformed the nature of retail.
Extremely High ROI Computer Vision Applications Examples Across Different Industries
Building Data Capture Services To Collect High ROI Business Data With Machine Learning and AI
Software packages and Inventory Data tools that you definitely need for all automated warehouse solutions
Inventory automation with computer vision - how to use computer vision in online retail to automate backend inventory processes