How you can optimize email marketing campaigns with machine learning based models that improve conversion & click-through rates.
Email marketing is still understood as one of the best channels to keep past customers and potential buyers in your business loop. Given this, you’ve probably already know that email marketing is one of the highest dollar ROI tasks you can do, especially in SaaS and ecommerce. Seemingly trivial details like the dates and times those emails are sent or the email design are unlikely to figure into your business decisions. But your email content doesn't matter if your customer never opens it.
Paying attention to small details, like date and time, can lead to long-term revenue gains because they're based on customer psychology. According to Forbes, email marketing is still the most powerful tool to take your business to the next level, but they also remind us that “effective email marketing strategies are personal, targeted, and crafted with the customers’ objectives and objections in mind.” Intelligent personalized email marketing uses data science, machine learning, and automation to discover these insights.
In this article, we'll explain why your retail business should use intelligent email marketing tactics, and how smart email marketing services can increase your marketing click-through and conversion rates.
What Date & Time Leads to The Best Conversion Rate From Emails to Your Customers?
The daily and weekly routines of your customers have considerable influence over how and when they want to interact with your business. If you can find insights into those routines, you can fine-tune your marketing to create convenience for your customers, make them think of your business approvingly, and increase your sales.
That’s the motivation behind email marketing optimizations, like finding the best date and time to email product information and send special offers to your customers. An intelligent email marketing software uses state-of-the-art data science and machine learning to provide insights into your customers’ routines, and create a personalized schedule for emailing them. Let’s see how your business can benefit.
3 Compelling Reasons Your Business Should Use a Machine Learning Personalized Email Marketing Strategy
1. Create Long-Term Revenue Gains For Email Marketing Campaigns
In their 2019 email survey (PDF), the UK’s Data and Marketing Association reported an astounding 42x ROI from email marketing and said it’s increasing every year. By integrating our email campaign optimization technology into your email marketing system, you can see game changing changes to your email campaigns revenue. The gains come from improved click-through and conversion rates. Even in our prototype, we’ve observed a 7.5-12.5% increase in conversion rates (we'll look at those results later in this article).
The long-term gains come from the increased lifetime value of your regular customers and the intangible benefits of matching your customers’ routine and psychology, making their lives easier by sending purchase reminders at the right time.
Our production deployment uses cloud services to cater to any scale — from hundreds to millions of transactions per day. As you scale up, you could see 7-12% increased revenue month-over-month when compared with your existing revenue growth without our intelligent emailing system. That figure is just from the machine learning emailing system alone, and doesn't count all the other revenue maximization initiatives we offer.
How We Implement Intelligent Personalized Email Marketing
Our email marketing optimization architecture is inspired by the approaches and models laid out in this incredible research paper. We focus on organizations selling B2C products such as consumer packaged goods, ecommerce products, and SaaS products to customers.
Based on the purchase history of your regular customers, our machine learning model first predicts the date of each customer's next purchase, i.e., the day they're most likely to purchase products from you again. Then, based on their most likely next purchase date, we derive the best dates and times to email purchase reminders, perhaps with attractive offers to improve conversion rates. The idea is that a personalized purchase reminder sent at that date and time has the best chance of converting into sales and increasing your customer's lifetime value.
Our Technology Stack
Our machine learning software is implemented using the Jupyter Notebook and the Python machine learning ecosystem of libraries like scikit-learn and XGBoost. This helps our data scientists create and refine models quickly, and go straight to production. In the production deployment section later in this article, we explain how we deploy this stack.
4 Aspects of Data Collection
Let’s understand the different pieces of data we need from you and your site to create an effective emailing schedule for your customers.
Purchase Transaction Data
You push your list of customer purchase data to our customer transaction application programming interface (API) endpoint. The rate and volume of data submission depend on your business requirements. If your sales see high volatility, you'll need to push the data in real-time to constantly update our model. If you can do with a more relaxed training period, you can push it in batches once a day or so.
These are all configurable aspects. Our API supports both models using scalable architectures described in the production deployment section later in this article.
We recommend that you scrub this data of personally identifiable information before pushing it to ensure the privacy of your customers and comply with regulations like the California Consumer Privacy Act (CCPA) and the General Data Protection Regulation (GDPR). We’ve also built models to do this for you.
On our end, we too ensure privacy by storing only the data we need, dropping the data we don't (like phone numbers), scrubbing personally identifiable information, and segregating the data over multiple databases to avoid collecting detailed profiles of your customers.
Email Marketing Campaign Open and Click-Through Data
You need to know your open rate — how many customers are opening your emails — and click-through rate — how many of them are clicking on the product or offer links included in your marketing emails. Most email service providers such as Mailchimp, Lemist, or Klaviyo track this information for you and allow you to export it. We also monitor that by appending a customer-specific, personalized email campaign ID as a URL parameter in each link in the email and inserting tracking pixels. When the email opens in an email client, the tracking pixel contacts a URL on your web server and gets logged in your server logs. When your customer taps or clicks a link, it reaches your site's web server and gets logged in your server logs.
Our click-through data API endpoint enables us to receive your web server logs, search them for the campaign IDs, find their opened and click-through dates and times, and create a dataset from that information for our machine learning pipeline. Note that this can become a bit confusing if you use an email forward link to chain an email thread. Many of the providers track this as well.
Conversion Data From Email Campaigns
If your web server and logging infrastructure can retain the personalized campaign IDs throughout a customer's browsing session up until checkout, we can directly associate their purchases with the email marketing campaign. For the most part the popular email automation platforms track this for you.
But if not, we indirectly correlate them based on date and time with some configurable thresholds. For example, a purchase within 5 hours of clicking a link is attributed to the campaign as a successful conversion.
Customer Contact and Settings Data
To email or message your customers, we either need their contact information to send them through our infrastructure or we will send them through your emailing infrastructure. We support both because some retailers may not be comfortable with the privacy issues of the former.
Our emailing infrastructure is described in the production deployment section later in this article. You can use our services if you don't have the infrastructure to support these campaigns. If you do have it, you just have to provide us with an API endpoint where our system tells you the date, time, contents, and customer ID to email. All popular email marketing automation platforms already store this information for you.
Other settings data include customer choices like the days and hours when they don’t want to be contacted.
Feature Engineering For Email Marketing Optimization
In the feature engineering phase, we process the raw data to derive additional features that are likely to improve the model's ability to learn patterns. The values in the raw transaction data and click-through data are turned into the primary features for the models.
List of Derived Features for Machine Learning
From the raw data, we derive a list of hand-crafted features (shown in the illustration above) to help the model make better predictions.
Most of these features are derived from the raw data using simple data transformations. But lifetime value segment and gender are complex and are determined using secondary machine learning models as we'll explain next. It’s not noted above but we’ve also experimented with using unsubscribe link clicks and landing page conversion statistics to boost prediction accuracy.
Lifetime Value Segmentation Using K-Means Clustering
Lifetime values span a wide range of amounts, but grouping them under a small number of customer segments helps the prediction model. Intuitively, we can guess that these segments correlate to the income and wealth levels of customers. We use a simple k-means clustering algorithm for this.
Again, intuitively, we can guess that these segments are likely to be specific to a retailer, product category, city, or other external socio-economic factors. In production, we plan to use HDBSCAN clustering to determine the number of segments automatically. In this prototype, we evaluate the silhouette scores of the clusters to visually determine the number of clusters.
Classification Model for Gender
If gender is not directly available from your raw data, we derive it using a classification model. In our software, we used an internal NLP based model that allows us to predict the gender.
Our Machine Learning Pipeline
The machine learning pipeline has two stages and a customer classification step, all of which we'll explain next.
Stage 1: Regression Model for Next-Purchase-Date Prediction
In this stage, we train a regression model capable of predicting each customer’s next purchase date. It's not our primary goal — ours is the best emailing date — but it's necessary to help us get there.
Number of Models
We can intuitively guess that the next purchase date for each customer correlates to the products and quantities they purchase. A customer who buys a carton or two of milk is likely to make their next purchase sooner than somebody who buys long-lasting house-cleaning products.
So, we recommend segmenting your customers by the product categories that appear most frequently in their purchases, then training a model for each group of related categories. You don’t need separate models for each customer — it's neither practical nor necessary. For our prototype, we selected a single product category and trained a model for the customers who purchased in that category frequently.
We chose extreme gradient boosting (XGBoost) as our machine learning model for this task. It's an ensemble learning approach that combines multiple weak learners and uses boosting to produce a strong learner. XGBoost is a feature-rich all-in-one model that helps us avoid the deficiencies of other machine learning models. Features like built-in regularized boosting help this model generalize to any kind of data by avoiding overfitting to the training data.
The input data split for training and testing is 80% and 20%. In machine learning, a strategy like cross-validation of the training data ensures that a model generalizes well against unexpected variations in the input data. That's sufficient for a simple model to predict accurately.
But XGBoost is a complex model whose ability to learn depends not just on the input data but largely on its hyperparameters. Hyperparameters are the knobs that control the behavior and execution steps of the model's algorithm. For example, the number of weak learners and the maximum tree depth are just two hyperparameters — among many more — that control how effectively it learns to predict.
So an effective training strategy for XGBoost has to plan for not just cross-validation but also for hyperparameter optimization.
Hyperparameter Optimization Using Grid Search
XGBoost has a large number of hyperparameters that control its behavior, and each hyperparameter can take a wide range of values — often real numbers. Every combination of hyperparameter values results in a unique XGBoost model whose ability to learn and predict will be distinctly different from every other model. We end up with a combinatorial explosion of practically infinite numbers of XGBoost models.
To make this problem tractable and home in on an optimum combination of hyperparameter values that gives the best predictions, we use a grid search strategy.
The grid search works like this:
It creates an XGBoost model for every hyperparameter combination.
It trains each model under 5-fold cross-validation, optimizing the model to minimize the mean absolute error (MAE) between the predicted and the actual number of days between purchases.
It evaluates the model's R2 score. A low R2 score means the model's predicted values are far from the observed values, resulting in more errors and larger errors. A high R2 score means they are close to the observed values.
The hyperparameter combination that results in the highest R2 score is selected as the best model.
The regression model outputs the number of days to the customer's next purchase. It's the day on which the model thinks a transaction is most likely to occur. The real value it outputs is rounded to an integer — for example, 10.764 becomes 11.
Stage 2: Find the Best Emailing Date and Time
With the number of days to the next purchase in hand, we apply a few personalization and business rules to give us the best emailing date and time.
The regression model's prediction is based on the general patterns it sees in the data from hundreds of customers. In this stage, we fine-tune that prediction for every customer based on their profile and history. We follow a pluggable architecture because these rules are likely to be different for every retailer and product category.
Best Emailing Date
From the purchase history, we can find the distribution of purchases for each day of the week. The day with the most purchases is the day when the customer probably finds it most convenient to buy. To find a dominant day for emailing, every day of the week is given a weight based on the number of transactions on that day of the week. We select the day with the maximum score that is closest to the day predicted by the regression model.
Best Emailing Time
The data processing phase generates click-through day of the week and time of the day features from web server logs. The features give us the distribution of the hour when your customers clicked the links in your marketing emails. We can also connect the personalized campaign IDs in those emails to that customer's next purchase.
There's enough data there to warrant a second regression model to accurately predict the best time of day. However, we kept it simple for this prototype by simply selecting the hour on each day of the week when the most click-throughs occurred.
Plus, we ensure that you don't disturb your customers at inconvenient hours of the day and account for each customer's do not contact preferences, regardless of what the click-through data tells us. For example, we can configure the system to delay emails scheduled after 9 PM until the next morning.
Classification Criteria to Select Regular Customers
We focus on your regular repeat buyers to maximize their customer lifetime value and increase your revenue. Targeting all email subscribers with marketing emails won't yield the same level of returns on your mailing and machine learning investments. You even risk getting unsubscribe requests from irritated casual customers, receiving a negative sender reputation, getting blacklisted as a spammer by an uninterested target audience, and affecting the overall deliverability of your marketing mails
So, we need to determine a set of criteria that helps us classify your customers into regular, repeat customers or otherwise. In production where there are potentially thousands of customers, we plan to use a separate classification model for this part of the process.
But for this proof-of-concept with a limited number of customers, we reuse the same regression model to yield a set of criteria. This is a one-time task after the regression pipeline is set up, and it yields a fixed set of criteria for now.
We first segment the customer group based on the following factors:
How many transactions have they done in the last N months?
How many recent purchases have they done?
How consistent is the gap (or the average fluctuation) between two purchases?
The thresholds for these are specific to each retailer and the type of products they sell. In our pilot project, we created six customer segments based on the number of transactions in the last two months with a maximum gap of 15 to 20 days.
We train six different models, one for each of these segments. We calculate their mean absolute error (MAE) and R2 scores, and select the criteria that result in the lowest MAE and highest R2 score. Only customers who satisfy these criteria receive your marketing emails.
Factors like your customers' purchasing behavior and your merchandise mix can change over time. An accurate model created six months ago may not be accurate now. Depending on the volatility of factors, even a month may be too long.
You need a strategy to keep refining your prediction models. There are two approaches:
Retraining: In this approach, all data is accumulated without deleting any of it. A new model is trained on that entire data set every few days or weeks. Because there’s a small gap between training sessions, a model may sometimes become less accurate during those gaps.
Online learning: In this approach, the model is constantly updating itself with every new piece of data it receives. This is suitable for highly volatile retail environments where product assortment and purchase behaviors have shown a high level of daily variability.
For our pilot project, we took the retraining approach of training a new model every month. Depending on your business needs, we can reduce this period or switch to online learning.
Contents of Marketing Emails
Your marketing emails are based on configurable email templates (with responsive HTML to look good on mobile devices) but will typically contain:
Links to products the customer has purchased in the past
Links to products you wish to recommend to them
Links to customized offers or discount codes
Call to action (CTA) links to purchase or check out new products
Exciting personalized subject lines (another detail that can be optimized using natural language processing)
Production Deployment and Data Engineering
So far, we explained the data science and machine learning ideas underlying our email marketing system. These fundamental ideas are valid in a production retail environment too. But production involves more data engineering to handle the scale and velocity of data we're likely to receive when our system is deployed.
The following sections address the data engineering and integration concerns around deploying our software in production.
Deployment Architecture and Approach
Our deployment architecture follows these principles:
For simplicity, we prefer to use the same machine learning stack from prototype to deployment.
Wherever possible, we prefer serverless managed platform services over self-managed deployments to infrastructure services. It's easier and cheaper, and our team can concentrate on the core data science aspects. For example, we prefer Amazon Kinesis and Amazon Lambda over deploying Apache Kafka and Spark on EC2.
We recognize that a client may prefer a particular cloud service provider. Our prototype uses AWS extensively, but we've implemented pluggable interfaces to support integration with any cloud provider of your choice.
Transaction Data Ingestion
The ingestion of your purchase transaction data works as follows:
Your site pushes your purchase data to our customer data API endpoint in batches or in real-time.
The API is published through an Amazon API Gateway endpoint and requests are received by a Lambda function.
The Lambda function uses an Amazon Kinesis queue for temporary data storage.
Another Lambda function reads the data from the Kinesis queue.
The Lambda function stores that data in S3. This is to accumulate the full transaction history for retraining our models every month.
Collection of Email Click-Through Data
The collection of email click-through data works like this:
The links in an email are appended with a URL parameter containing a customer-specific personalized campaign ID.
When your customer clicks or taps the link, your site's web server logs the requested URLs and URL parameters, including the campaign ID.
Your system pushes these logs in batches to your click-through data API endpoint.
The API is published through an Amazon API Gateway endpoint, and requests are received by a Lambda function.
The Lambda function stores the logs in S3.
If campaign IDs are retained throughout a customer's browsing session until checkout, we directly associate product purchase data with the campaign through the same web server logs you sent as part of click-through data collection. If not, we associate it during the data processing phase with time thresholds.
If you don't have emailing infrastructure, your system sends customer contact details to our customer contacts API endpoint. We also store this data in S3.
All the raw data we need is now collected in S3. When the model's first training or subsequent training is scheduled, our data preprocessing pipeline works like this:
We use Amazon SageMaker as the managed machine learning platform for our entire data processing and machine learning pipeline.
The local Jupyter Notebooks are adapted to work with SageMaker's API and scikit-learn support.
The data processing logic reads the customer transaction data and web server logs from S3.
The web server logs are processed for campaign IDs.
The logs give us the click-through dates and times, and if tracking data is available, they allow us to associate conversions with a campaign ID too.
Our pipeline associates these three pieces of data — customer transactions, click-through events, and conversion events — through customer IDs.
The feature engineering works like this:
The Jupyter notebooks running on SageMaker use simple data transformations to derive the features for our models.
The k-means secondary machine learning model is run at this stage to segment by lifetime value as a one-time task.
The gender-classification model based on customer names (or purchased products) is also run at this stage to infer the genders of customers.
Model training and retraining occur about once a month. They work as follows:
We initialize SageMaker's SKLearnProcessor adapter to run scikit-learn pipelines.
The XGBoost model is set up with grid search and cross-validation strategy.
The model is trained on SageMaker's infrastructure.
The one-time task of determining who's a regular repeat customer is done by reusing the regression model on six different customer segments.
We schedule model retraining every month using Amazon EventBridge and Lambda.
The latest regression model is stored on S3.
Prediction Model Inference
Model inference to predict next purchase dates for customers works as follows:
We schedule model inference to run once every day using EventBridge and Lambda.
The Lambda function executes our inference notebook on SageMaker.
The notebook loads the latest trained regression model from S3.
It reads the customer transaction data from S3.
It uses the criteria to select only customers who are regular repeat buyers over the last N months (N is configurable according to your business requirements).
It uses the regression model to predict the next purchase dates for these selected customers.
Fine-Tuning the Best Emailing Date and Time
The personalized fine-tuning of the best emailing date and time for these customers is run as part of the same pipeline in SageMaker. It produces an email schedule list containing customer ID, emailing date, emailing time, and email body with links to products and offers. This email scheduling data is stored in S3.
Email Scheduling and Sending
If you’re using an automated emailing tool such as Mailchimp, Klayvio, or Sendgrid we can automatically send the emails via API.
If you don't have it, our emailing workflow goes like this:
We schedule every item in the schedule as an event in Amazon EventBridge.
At the scheduled date and time, EventBridge triggers our emailing Lambda function.
The emailing function loads contact details data from S3 and finds the customer's email address corresponding to that customer ID.
When we compare the click-through and conversion metrics from this personalized approach against the existing method of sending follow-up emails on fixed days to every customer, we see the following outcomes.
Increased Click-Through Rates
We find between a 12-17% increase in click-through rates at different time intervals — within 1 hour, 2 hours, 12 hours, and 24 hours — after the emails are received. These are statistically significant results with confidence levels over 95% in every interval.
Increased Conversion and Purchase Rates
We find between 7.5% to 12.5% increase in conversion rates at different time intervals — within 1 hour, 2 hours, 12 hours, and 24 hours — after the mailed links are clicked. Except for the 12-hour interval, these are statistically significant results with confidence levels over 95%.
We plan to explore the following improvements in the future:
Add more customer behavior features — like browsing and clickstream data — to the model. We'll use representation learning to reduce them from high-dimensional noisy data to short embedding vectors that encapsulate common patterns in customer behavior.
Use a deep neural network for learning deeper non-linear features and regression fit instead of using an XGBoost model with hand-crafted features.
Don’t Ignore the Easy Boosts In Email Marketing
Paying attention to the little hidden details in even something as simple as emailing can bring you surprising gains in sales and customer engagement. There is a best date and time to contact your customers, and if you can figure out what it is, the chances of your campaign succeeding are higher. These and many other insights are available for your SaaS, ecommerce, or retail organization, thanks to our expertise in data science and machine learning. Contact us today.
Deligiannis, A., Argyriou, C., and Kourtesis, D. (2020). Predicting the Optimal Date and Time to Send Personalized Marketing Messages to Repeat Buyers. International Journal of Advanced Computer Science and Applications, 11. https://doi.org/10.14569/ijacsa.2020.0110413
Stay up to date with the world of machine learning & ai
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Ai & machine learning consulting company focused on increasing revenue for clients. We specialize in data science and deep learning development that give businesses a better understanding of their revenue streams and building tools to make them more profitable.