Automate Health Care Information Processing With EMR Data Extraction - Our Workflow
We dive deep into the challenges we face in EMR data extraction and explain the pipelines, techniques, and models we use to solve them.
Let's take a look at what product matching is and 5 different ways you can use product matching software in your ecommerce business to generate positive business roi.
Product matching is the process of leveraging machine learning and different data sources to match products based on similarity. In most cases, this comparison is between our own products vs our competitors, but large retailers like Walmart use this algorithm to look at already existing products in their store vs a new product a seller is trying to list. In the past retailers used attribute information such as SKUs, titles, GTIN, and other data points to make comparisons between two products. As you can imagine, this is not an efficient and accurate way to compare products at a large scale of products, or against all competitors on the market.
These two jackets would be nearly impossible to compare with the attributes available even though they are the same product.
As we'll see throughout this article, product matching is an extensive topic in retail and ecommerce that covers many different use cases that produce roi.
Modern product matching uses many different features and machine learning algorithms to compare the similarity of products. The wide range of available similarity algorithms we can use allows us to build our comparison tools based on any level of available product data. These points below are common components of product matching today.
Using deep learning algorithms such as spaCy and GPT-3, Width.ai builds a title similarity module that learns to better understand contextually similar titles even when the compared title strings are very different. Here's an example of 4 titles from the same exact product:
Garmin nuvi 2699LMTHD GPS Device
nuvi 2699LMTHD Automobile Portable GPS Navigator
Garmin NUVI 2699LMTHD — GPS navigator — automotive 6.1 in
Garmin (nuvi) 2699LMT HD 6" GPS with Lifetime Maps and HD Traffic (010–01188–00)
The same products can have very different looking product titles which will make matching difficult when using exact string comparisons or even similarity tools like damerau–levenshtein.
Price comparison is one of the features we can use when matching products in a larger algorithm. We mostly use two data analysis algorithms to help us map price similarity in our matching:
Image similarity is one of the most powerful and important deep learning algorithms we can use to find the similarity between two products. As we'll see in the use cases there are many different tasks image similarity can be used for when matching ecommerce and retail products. Width.ai's module for images can learn the similarity between products no matter the angle, image quality, design size, or background.
We've built this high powered solution using the most up to date image recognition architectures and fine tune it for each specific business use case. This fine tuning allows us to tailor the results for the ecom brand and produce model accuracy results that smash prebuilt solutions.
Product attributes like brand, size, condition, model number, colors available, description, and more can still be used as effective data points to match our products. We can split our attributes into two different categories:
We use custom built neural networks to learn the relationship between product similarity and two products attributes.
UGC analysis allows us to use product reviews as a module in our product matching solution. We build a custom gpt-3 based tool that learns key talking points and keywords in the reviews left for a specific product. This information can help us learn more about the similarity in how two products are percieved then how they are presented for sale. This comes in handy when the use case of product matching turns more customer focused instead of strictly presentation.
Large retailers and ecom brands spend 1,000s of hours and deploy entire teams to scour the internet looking for brands using their designs, logos, or products as their own and selling them. We know a few brands by name that are doing this search manually every day and have made it a priority to automate this search process. Often these designs will not be visually identical to our stores and can have completely different titles and product information making it difficult to use old school google search methods.
The focus point for our product matching system for identifying copyright stikes starts with an image similarity model and uses title similarity as a reinforcement of the results. This multiple heads approach allows us to rely mostly on the image we find, but use titles as an input to adjust our output similarity result. The main benefit to focusing on images when matching for copyright strikes is it allows us to look for designs that are similar to ours but the competitor has changed everything else about the product. Often stolen designs will be changed slightly in appearance but greatly changed in the information presented, to hide the action. Here's a breakdown of the two components:
Width.ai has built a custom image similarity model for copyright strikes that focuses on the model's understanding of product design and graphics. Our model can be pulled apart and customized to any specific use case or industry. The best part about the ability of our model is you are not required to include your product images in the training data each time, allowing you to quickly run new images through without retraining the entire model.
Our model has learned how to identify what matters in a product image, not what the product actually is. This improves generalization and greatly outperforms out of the box options.
GPT-3 or SpaCy based similarity models are our go to models to learn the relationship between sentence structure and word placement. The key knowledge we're trying to gain using this component here is a standardized way to decide strike or not when the image is close. Adding this component boosts our overall accuracy by quite a bit and removes edge cases causing false positives.
Price intelligence allows us to understand how competitors are pricing similar or competing products to ours and track how they adjust the prices over time. These price insights are incredibly valuable as most customers compare prices across multiple competitors before making a decision. This tool allows you to automatically stay price competitive and easily boost revenue by 9.3%.
Too often today this task of price intelligence is done poorly and doesn't allow for an efficient and effective process.
Our system here produces product matches based on all the components laid out in our initial section and focuses on understanding the relationship between a retailers item attributes to form a group of matches. With this data aggregated together, we provide powerful insights that allow you to optimize your own products pricing in real time. Over time this price optimization raises revenue by increasing customers as they go and compare your price to others.
Text models such as GPT-3, BERT, SpaCy, etc let us analyze titles, descriptions, product categories, and much more as the matching works to understand how the competitor product is being positioned.
Our custom image model uses popular architectures such as ResNet, Siamese Networks, and Keras to learn what our regions of interest are in a product image. These include designs, colors, product type, logos etc and are the backbone of how we scale our search for competitors to millions.
Use deep learning based match algorithms to discover information gaps in your listings causing you to lose potential customers to competing companies. This software system analyzes competing listings from matched stores and through training data learns information in descriptions, titles, upc codes, google analytics, and other identifiers that will lead to higher user conversion rates for you.
Once we've gathered our competing products we use multiple ai tools to analyze the different sections of a listing. We start with our GPT-3 based model to digest and make sense of the product listing information. The system not only extracts key talking points and keywords used across successful listings but understands language norms and sentence structure to compare our description to ours. Our GPT-3 solution is tailored just to your businesses use case which will always produce better accuracy and satisfaction than out of the box options.
Titles can be used to extract important keywords and copywriting knowledge in the same way we do using GPT-3. Identifying gaps in your titles where other websites have figured out what information to include to see more "product xyz sold" emails come through is one of the most important conversion based optimizations you can make. The best part being we can eliminate the manual guessing game and use raw market data understanding to do that.
A challenge faced when filling listing gaps is understanding how many colors, SKUs, categories, etc we need to increase customer conversions. Neural network driven learning allows us to track and identify what attributes are must-haves for a product market we are trying to dominate. Once our model learns the relationship between important attribute identifiers and market leaders in the exact product space we can optimize our own listing based on what gaps to fill.
We build a ton of recommendation system solutions for ecommerce and retail, and collecting valuable data to use for training is always a hurdle we must account for. When we want to build recommendation system solutions around on site recommendations we can use competitor products and their recommended products as training data for our own website.
Not only are manual data extractions slow and a waste of human resources, they also lead to more data quality issues and mistakes when following a standard data format. Product data requires way more accuracy & standardization than general market or customer data given the wide variety of sources and features. No matter the use case, this cannot be completed at scale without automated data quality processes. Anytime we want to match products from various stores to ours we grab data and there must be a process to extract, clean, parse, and format the data to pass into our match system.
The data quality module allows the retail store or online brand to easily plug the custom piece into any product match use case and begin cleaning and standardizing the powerful data being used with NLP patterns, attribute standardization, feature parsing, and many more data science pipelines that automate your data collection methods.
Width.ai builds custom ai software solutions that deliver clear cut roi to your business and give you a new competitive edge in your market. Ecom is slowly moving towards using ai to gain an edge and we've built and used all the models needed to put huge increases in AOV, LTV, and revenue right in your lap. Let's talk today -> www.width.ai/contact