How LTV Prediction Can Improve Sales & Marketing Strategies

Karthik Shiraly
November 10, 2021

Implementing lifetime value (LTV) and LTV prediction using machine learning allows you to build out powerful predictive data models that lead to serious long term gains in revenue. A study from Bain & Company showed that just a 5% increase in retention rate could lead to over a 25% increase in profits long term. Building ltv based predictive models could be as simple as taking data you already have and plugging it into one of our popular algorithms built for any business model with customers. 

In this article, we'll explain what LTV and LTV prediction are, how they’ll benefit your company, and how to implement them.

What Is LTV?

Lifetime value — also called LTV, customer lifetime value, CLV, or CLTV — is the total value that a customer brings to your business throughout their engagement with you. The most tangible and direct value a customer brings is the revenue they give your business when they purchase your products. However, they may also add intangible value by bringing you more customers through word-of-mouth or online referrals, investing in your business, or providing you with services that you need, for example.

Despite the name, it’s common for businesses to restrict LTV to a specific time period, rather than observing the entire customer lifetime, depending on data availability or business goals like short-term revenue forecasts.

What Is LTV Prediction?

The LTV of a customer, up to this point, can be calculated based on their history, like what products they purchased, for how much, and when. It's easy to calculate from historical sales data. But LTV prediction goes a step further — it tells you how much a customer is likely to spend at your business in the future. Predicting the future value of your customers helps shape your business strategies to give you a competitive edge, as we shall see next.

How Is LTV Calculated or Predicted?

There are many approaches to customer lifetime value calculation or predicting customer lifetime value.

Calculating LTV requires historical purchase amounts and dates. These are the most common historical LTV models:

  • Customer model: If historical purchase amounts and dates are available for each customer, then customer-level or user-level LTV is calculated.
  • Aggregate model: If data isn’t available for each customer, then an average revenue per customer is calculated by summing all revenue that was received in a timeframe and dividing that by the number of customers.
  • Cohort model: Another approach when data isn't available for each customer is to divide customers into multiple cohorts based on some criteria — like a time period or purchase category — and calculate average cohort-level revenue.    

If you don’t have historical data, you can still use the LTV prediction for new customers (described later in this article) to estimate LTV based on demographic data while gathering sufficient data. 

Predicting future LTV can involve two types of models:

  • Statistical model: These use probability distributions to model the phenomena underlying purchases, then use the models to predict future LTV.
  • Machine learning model: These compile past historical data to learn the complexities of the phenomena underlying customer purchase behavior and use that information to predict future LTV.

Statistical Models for LTV Prediction

Prior to the advent of using a machine learning model, LTV prediction used to be modeled using statistical-theoretical distributions and assumptions. These probabilistic models are still widely used in data science and business.

Techniques like Pareto/NBD and Pareto/GGG model the purchase process, lifetime, and customer behavior using probability distributions like Poisson, exponential, and gamma distributions. However, they aren't good at modeling the complex phenomena underlying user behavior and purchasing decisions

Why Statistical Models Are Still Relevant

If your e-commerce business is new to LTV prediction, we suggest starting with statistical models because of their simplicity. The only data you need is purchase amounts and dates. These models calculate the recency, frequency, and monetary value tables from that data. They can provide you with the baseline predictions you need for an initial set of marketing decisions. For all our e-commerce clients, we always use these models to make baseline predictions before using machine learning.

Statistical models run fast, provide reasonably accurate results when used correctly, and are available through well-tested software like Lifetimes (part of the Python library). It’s even possible to run them in Excel. If these models show promise, you can upgrade to machine learning techniques.

Neural Networks for LTV Prediction

neural network architecture for ltv prediction
Deep fully-connected neural network architecture for LTV prediction (Source: Pollak)

Machine learning algorithms like random forests and neural networks have been used for LTV prediction. Their advantage over statistical models is their ability to make use of all your customer data to empirically model the complexities underlying LTV. For example, they can include factors like demographics, purchase behavior, and changes in user behavior to refine the prediction model.

One such experiment by an eyewear retailer compared a Pareto/GGG statistical model with a fully-connected deep neural network to predict LTV from their sales data. The deep neural network has five dense layers with rectified linear units (ReLU).

They found that their deep neural network scored 94.6% accuracy while Pareto/GGG scored 88.6%. Both accuracies are pretty impressive and support what we suggested above — start with statistical models to get a baseline, test the waters with your business decisions, see the outcomes, and then move up to machine learning for more accurate predictions.

Using Customer Behavior Embeddings to Improve LTV Prediction

By now, you must have realized that using a machine learning model like random forests and neural networks are quite popular in modern customer lifetime value calculation & prediction. A technique that doesn’t seem to belong here is representation learning. Yet, Chamberlain, et al., used exactly that to improve LTV predictions for an online fashion retail use case. Let’s explore the novel approach they describe in Customer Lifetime Value Prediction Using Embeddings.

The Intuition Behind Customer Behavior Embeddings

The sequence of products that each customer viewed can be very useful for LTV prediction. High-value customers tend to look at products of higher value, which are often products that are less popular because of the high price. Lower-value customers tend to look at low-priced products and congregate during sales periods. If these behaviors can be extracted as features, they can be used for LTV prediction.

The product sequences can be derived from the browsing logs of website and mobile user sessions. However, when you have 85,000 items in your product catalog and 12.5 million customers, there’s a combinatorial explosion of possible sequences that makes it intractable to model them as handcrafted features.

Instead, these high-dimensional sequences can be replaced by low-dimensional embedding vectors that represent each customer’s product viewing behavior concisely. If two customers are close to each other in the embedding space, they viewed similar products at around the same time. On this model, a high-value customer who viewed high-value products during a sale would be far from a low-value customer who viewed low-value products during the same sale.

Generating Customer Behavior Embeddings

Normally, an embedding is generated by a neural network with an input layer, an equally long output layer, and one hidden layer consisting of a relatively small number of neurons. However, 12.5 million customers mean 12.5 million softmax output neurons have to activate for each input pair, making training a prohibitively expensive exercise.

To make it more practical, these researchers used a skip-gram with negative sampling (SGNS) model. SGNS evaluates only a small set of customers at each training step. SGNS can be understood better by considering how it works in word2vec of natural language processing. There, a sliding window over word sequence (of, say, 5 words) is called the context. Given one word as input, the skip-gram learns to output all the other words in its context learned from the corpus.

Similarly, a customer context is defined as a subset of all the customers who viewed the same products at around the same time. Given one customer at the center of the context as input, the SGNS model learns to output the other customers in that context. The authors found experimentally that a context of 11 customers worked well.

Combining Embeddings With Random Forests

Data layout for embeddings for ltv prediction
Handcrafted features for LTV prediction random forest model (Source: Chamberlain, et al.)

The baseline predicted LTV model is a regression random forest. It’s trained on several handcrafted features extracted from customer demographics, purchase history, product returns, and product information. The training data is the true LTV values calculated using LTV formulas on previous year’s data. They use feature data from the past 12 months because retail has strong seasonality effects that can be problematic if features are aggregated over longer periods. They define LTV itself as sales minus returns over just the previous year, not really over a lifetime.

The enhanced second regression random forest is the same as the baseline model, but it adds the customer behavior embeddings as another feature.

The third model is a separate random forest trained for churn classification on the same set of handcrafted features and customer behavior embedding features. It’s not directly related to LTV prediction, but they consider a customer churned if they have zero purchases over the previous year — that is basically zero LTV by this project’s definition of LTV. So LTV and churn predictions together help them identify both high-value and low-value customers to make business decisions about marketing spend.


actual LTV related to predicted LTV
AUC uplift over baseline
Actual LTV vs. predicted LTV density plot — AUC uplift over baseline (Source: Chamberlain, et al.)

The baseline LTV model showed a good fit between actual and predicted LTV. The Spearman rank-order correlation coefficient between actual and predicted was 0.46, indicating that the ranked orders of actual LTV and predicted LTV show a fairly strong positive relationship.

The churn classification machine learning model — effectively a binary classifier for zero LTV — that uses behavior embeddings as a feature, demonstrated significant uplift in AUC with optimum embedding lengths ranging from 32 to 128. Since the enhanced LTV model and the churn model are both trained on the same features, we can conclude that they are better than the baseline models without embeddings.

How Your E-commerce Business Can Use Customer Behavior Embeddings

Customer behavior embeddings are an innovative approach to including high-dimensional, highly dynamic data like browsing, clickstream, or eye-tracking data. By defining an embedding context that’s suited to your sales and seasonality patterns, you can effectively segment your customers in the embedding space and train explainable models on highly dynamic data, a normally tricky task. These machine learning models will be much more sensitive to customer behavior patterns that regular models, trained only on handcrafted features, may miss. They can help you:

  • Make your LTV prediction models more accurate, especially when confronted with changes in recent customer behavior
  • Check if sales have reduced across an entire customer segment in a period, which may be a consequence of a flaw in your product offering that is driving your high-value customers away

Customer Lifetime Value Prediction for New Customers

All the LTV prediction models you have seen so far have a common problem — they work well only when historical purchase data is available for a customer, but their predictions for first purchases and new customers won’t be as accurate. At first glance, this seems like an impossible problem to solve — how do you predict LTV from just one data point? Wang, et al., proposed a better approach for this problem in their paper, A Deep Probabilistic Model for Customer Lifetime Value Prediction.

The Problem With Other Models

Lot's of zeros in terms of LTV + 1 for customers
Most customers are one-time purchasers with zero LTV (Source: Wang, et al.)

All LTV prediction models are regression models that predict a continuous variable. The mean squared error (MSE) is the most commonly used loss metric by these models. This setup allows us to create prediction models similar to the architecture of other regression based product such as future real estate price prediction. 

But lots of customers turn out to be one-time purchasers who never return. It’s a common phenomenon that all e-commerce businesses face. The LTV labels for such cases will be zero but zero denotes the absence of an LTV rather than a valid LTV. Unfortunately, MSE does not differentiate between zero as a status and zero as a value, which affects prediction accuracy when there are too many zero labels.

Another problem is that LTV distributions are highly skewed. Often, a small number of high-value, high-spending outlier customers account for most of the total customer spend. MSE’s squaring exaggerates the impact of these outliers, biasing the models to heavily penalize the predicted LTVs of high-value customers.

Zero-Inflated Lognormal (ZILN) Loss

ZILN loss
MSE distribution vs. lognormal distribution for heavy tail (Source: Wang, et al.)

To solve these problems of MSE in the LTV prediction problem, the paper proposes a new loss function called zero-Inflated lognormal (ZILN). ZILN treats LTV prediction as two problems:

  • “Zero” LTV data points should be treated as a binary classification, indicating a tendency to not purchase, while any non-zero LTV indicates a tendency to purchase.
  • The heavy-tailed nature of LTV distribution due to a small number of high-value customers should require better modeling of the distribution itself.

ZILN loss addresses both by combining a classification loss term for zero LTVs and a regression term for LTV by modeling it as a lognormal distribution that is naturally suited to heavy-tailed phenomena.

Deep Neural Network With ZILN Loss

neural network for LTV
LTV prediction DNN with ZILN loss (Source: Wang, et al.)

The deep neural network used with ZILN loss is a fully-connected neural network with an input layer, two hidden layers, a special logits layer, and an output regression neuron.

The deep neural network’s last layer is a special one with three units:

  • One determines the probability of a returning purchase, p, and has a sigmoid activation function.
  • Another determines the mean, μ, of the LTV of returning customers and has identity activation.
  • The third determines the standard deviation, σ, of the LTV of returning customers and has softplus activation.

These are necessary because ZILN loss depends on these three variables.

Its middle layers are two standard hidden layers of 64 and 32 units. They learn a shared representation for two tasks: classification of returning customers and prediction of returning customer’s spend.


Baseline vs DNN with ZILN and MSE
Classification Gini coefficients for ZILN loss and MSE loss (Source: Wang, et al.)

The machine learning model was run on Kaggle’s acquire valued shoppers challenge dataset. The model’s ability to classify customers as returning or not was measured using the Gini coefficient. ZILN loss outperformed MSE loss with 11.4% relative improvement.

LTV prediction MAPE errors for ZILN and MSE loss (Source: Wang, et al.)

The model’s ability to predict returning customer spending was judged using a decile-level, mean absolute percentage error (MAPE). ZILN loss showed much lower decile-level MAPE than MSE loss — 68.9% lower.

Area under precision-recall curve for returning customer prediction. 
Area under precision-recall curve for returning customer prediction. 

Why Accurate LTV Forecasting for New Customers Is Essential

ltv forecasting dashboard

Is your new customer a potential future high-value customer? Should you direct your marketing spend towards getting them to increase their purchase frequency or purchase value at your business? Are they a churn risk? These are questions that a modern machine learning model like ZILN can help you answer accurately. Instead of a scattershot marketing strategy where you’re operating blind, you can let your data guide you to the most optimum ROI path.

9 Ways LTV Prediction Benefits Your E-commerce Business

ecommerce warehouse

The model explanations above will help you implement LTV prediction in your business. If you aren’t looking into customer lifetime value prediction, you are missing out on a substantial number of potential business benefits. Professor Peter Fader, a pioneer of CLV, used it to accurately forecast the revenues of Wayfair, a furniture e-commerce giant, in the coming decades. These are just a few of the most useful benefits of LTV prediction.

1. Know the Future: Who Will Become Your Most Valuable Customers Next Year?

customer segmentation

Custom lifetime value can tell you who your most valuable customers were and currently are. They’re the ones bringing most of the revenue to your business. But people change — their interests and responsibilities may change or their income levels may change. Your top spenders today may not be your top spenders in six months. If you have products in many categories, you’ll have different valuable customers in each category.

You can’t manually anticipate and plan for such a wide variety of future possibilities. So you may just ignore all complexity, simplify your calculations by aggregating historical purchases, pray that those customers keep buying at the same rate, and hope for the best.

But there’s a better technique for processing all those possibilities. LTV prediction combines historical data with observed changes in customer behavior, customer status, or session and engagement data to actually predict the future lifetime value of an individual customer or a cohort of customers. It’s like a time machine that can tell you how much your customers are likely to spend in the future over a certain time frame.

2. Improve Your Customer Segmentation Strategy

multi dimension customer segmentation

Businesses often segment customers based on demographics like age, gender, income levels, and so on to market and sell to them more efficiently. But when you segment this way, customer revenue stays in the background as an implicit goal while demographics are the visible factors that you hope will help you meet that goal.

If you can directly predict the future revenue of each customer instead of relying on hope, then revenue itself becomes a reliable basis for segmenting your customers. You can directly segment your customers into high, medium, and low revenue groups or decile-based groups. You can tailor your marketing and selling to these segments instead of relying on indirect factors like demographics.

3. Optimize Your Marketing Strategies & Overall Ad Revenue

Effective marketing involves many factors. Which marketing channel will be the most effective for a customer? How much revenue can you attribute to each marketing channel? What branding strategies will work best? How much money should you spend on each channel? What’s the cost per ad (CPA) and return on ad spend (ROAS)? What is my mobile marketing profitability long term?

These questions become easier to answer if you know what the future revenue from that customer will be. Without that, calculating future returns on your marketing investments is impossible. LTV prediction helps with marketing optimization by helping you understand likely future returns from an individual customer or a customer segmentation group. These same machine learning models allow you to build powerful mobile marketing prediction models that help you understand the differences between your ads on desktop and mobile in terms of average revenue, customer churn, net profit, and total revenue generated.                   

4. Increase Your Sales and Profits

These improvements to your marketing strategy will help you increase your sales and profits, and higher sales and profits tend to invite investment, merger, acquisition, or exit opportunities. 

5. Design Better Loyalty Programs, Offers, and Discounts

When you have an idea about potential future revenue from each customer, you can tune your loyalty programs, offers, and discounts to those amounts, which will help you optimize your sales while keeping your spending tight.

6. Increase Your Customer Retention Rates

Lifetime value prediction can also increase your retention rates. Retaining existing customers is often less expensive than new user acquisition. And when you fine tune your marketing offers, each customer feels special and sends more business your way. 

7. Reduce Your Customer Acquisition Cost

Surveys say acquiring a new customer is anywhere from five to 25 times more expensive than retaining an existing one. Lifetime value predictions help you target only those potential customers whose lifetime value makes it worth the user acquisition cost. LTV predictions can reduce your acquisition cost for new users and optimize your return on investment (ROI).

8. Streamline Your Procurement

When you know what your customers are likely to spend in the future and you know what your customers prefer to purchase, you essentially have a window into future sales. You can procure the most in-demand products and ensure their availability to your highest-value customers.

9. Know Your Future Cash Flow

ltv dashboard

Predicting your customers’ lifetime value effectively helps you understand your average revenue down the road and what changes you can make to improve metrics such as user acquisition and ad revenue generated. 

10. Help Assessing Mobile Marketing Profitability In Terms of Average Revenue

Predicting your customers’ lifetime value and using it to assess how strong your mobile marketing campaign is effectively helps you understand your average revenue from ads and what changes you can make to improve metrics such as user acquisition and ad revenue generated. 

prediction forecasting

LTV Forecasting Is Your Secret Weapon for E-commerce & Retail Success

We’re often surprised that LTV prediction isn’t used more in e-commerce or the wider business world. Most businesses already have the relevant data and industry insights to build powerful long term value models that can predict actual outcomes.

Terms like prediction and predictive modeling may make people hesitant because they sound too complex. In reality, LTV prediction techniques can range from simple to complex. We at Width.ai have an experienced data science team that understands exactly the workflow that works best for you and your data. We can build custom applications to help you make use of these invaluable techniques that can grow your business exponentially. Contact us today.

width.ai logo


  • Chamberlain, Benjamin Paul, et al. “Customer Lifetime Value Prediction Using Embeddings.” arXiv:1703.02596 [cs.LG], 6 Jul. 2017, https://arxiv.org/abs/1703.02596
  • Pollak, Ziv. “Predicting Customer Lifetime Values -- ecommerce use case.”  arXiv:2102.05771 [cs.LG], 10 Feb. 2021, https://arxiv.org/abs/2102.05771
  • Wang, Xiaojing, et al. “A Deep Probabilistic Model for Customer Lifetime Value Prediction.” arXiv:1912.07753 [stat.AP], 16 Dec. 2019, https://arxiv.org/abs/1912.07753.