Ecommerce Product Taxonomy & Categorization With GPT-3

Matt Payne
August 21, 2023

Creating relevant tags and product data points that allow us to automatically categorize our ecommerce products is a powerful automation tool that not only saves us time and manual effort, but creates a taxonomy system that improves the process of a customer reaching our store to finding what they are looking for.

The goal is to give customers the quickest and easiest path to what they are looking for, which reduces lost customers. This research report found that when a customer searches for a product they want, 47% gave up after just one attempt, 23% tried 3 or more times.

By using GPT-3 to create our product tags and categories we allow the model to make decisions on what tags and categories match to what products based on slightly different criteria than outdated past systems would use.

  1. Information extracted from product information (we’ve seen this before), but the model has the ability to choose “higher order” data points based on a learned algorithm.
  2. Tags/Categories based on information about other products, not just this product in a vacuum. 
  3. Automate the time consuming process of creating this information, and quickly turn it into a framework for a product taxonomy model.

We’ll break down how to build this system and show real examples of GPT-3 working on real product listings. 

What is Product Taxonomy?

Product taxonomy is simply the process of organizing our ecommerce products into categories and tags that give us a system to get customers to the exact product they are looking for quicker. This includes creating categories, tags, attributes and more to create a hierarchy for similar products. Improving the time to sale number is a huge deal for ecommerce brands and the customer experience for users who come to our store. 

What is Product Categorization?

A subprocess in the general idea of product taxonomy, categorization is the process of grouping these products with tags and attributes. We can think of this as the first step in the general taxonomy, where we use product categorization to create a system that takes advantage of what we’ve created here. GPT-3 will allow us to build an automated pipeline to do this effectively and use what we’ve generated in taxonomy. 

Let’s Start Building Our Ecommerce Product Categorization

Our focus starts with how we can use GPT-3 to generate product categories for us. We’ll use very simple product data that you already have - your product listing! We’ll take the product title, product description, and other information to create tags and categories. Let’s start with our first example product. 

We’re going to create our categories and tags from this womens wool coat from Nordstrom. We’ve got a title, description, and details section.

We set up our product information in GPT-3 like this:

We’re going to start with zero shot learning. Zero shot learning is using GPT-3 and prompt information to generate output with no prior examples to use as reference. For any given task in GPT-3 this is the most raw result GPT-3 will give you back based on the model's understanding of the task. The output will change based on how you refine the different parameters that GPT-3 offers, which you can learn more about here. Without giving away too much information about our current processes, we asked GPT-3 to give us some categories and products for the above listing, without having seen any examples of what we expect. 

This is pretty good! Through some prompt optimization and parameter tuning we were able to generate some pretty nice product categories and tags. A couple things to note about the output that is very important when you’re looking to get into product taxonomy. 

  • Our tags are much more refined details about the product which is what we want. “Wool Blend” and “Double Breasted” are both features of the overarching category of “Outerwear” in clothing. 
  • Our product categories are much broader categories of clothing, and even include “Women’s Clothing” as the main category. 
  • It’s important to understand the different variations of the output the model leans towards. In production you’ll want a final format check to understand the generated result and how we can transform it to the expected format if needed.
  • We’ll look at this more down below but understand how hyperparameter adjustments affect your output. These will have to be adjusted not just based on achieving a higher accuracy but also based on taxonomy goals.

If you actually look at this product listing on Nordstrom, you’ll see these are the main categories that this product falls under which valids our results. 

Through some prompt and hyperparameter tweaks we can get the model to give us more refined information about the product and use them in tags. This is still just zero shot learning! We haven’t even shown the model examples of what correct output would be.

Enhanced Categorization For Ecommerce Product Taxonomy

Now that we’ve seen what we can get to with zero shot learning, let's take a look at what we can do when we give GPT-3 some examples of what we want our output to be. You might wonder why we even want to do that if zero shot learning can reach results fairly easily. By showing the GPT-3 model some examples of output that we like we can adjust the model's willingness to use a deeper understanding of the prompt and start creating deeper meaning features. 

Remember, GPT-3 tries to follow the instructions you give it, and use any prior examples to interpret what you mean. If our examples are very keyword heavy and just extract a lot of product describing features (like the “Product tags” example above) then our model will assume to do that for all future text generations.  What we want to do is show the model some examples that include inferred categories and tags that require the model to not only extract information about the product listing but create new categories for us. 

Optimize The Prompt

We’re going to add a few examples of product listings from the same site and generate categories and tags like we did before. 

Nordstrom Women's Coat

Adding other seasons and clothing types

New examples added to our prompt

We’ll use output similar as we’ve seen where we generate product categories and product tags. This time however we’re going to add whatever season best fits the product. The model will have to learn through the information given to choose the correct season. There will be no mention of the season in the product listing, GPT-3 will have to learn how to reach that decision. 

Now that we’ve added our examples, let’s run a new product through:

Our results!

As we can see GPT-3 was able to learn how to deterministically create output based on the input text and a few previous examples. Not only did the model learn how to pick a season for the clothing item, but decide which keywords we care about using as tags and categories.

Best Practices For Using GPT-3 For Product Categorization

Understanding what fine tuning and prompt optimization does to our output tags and categories is critical to make enhancements and move the models generation logic towards what we are looking for. Here are some quick points about the different parameters and prompt of GPT-3.

  • Understand how the temperature parameter affects the randomness of your outputs. Temperature essentially tells the model what level of “risk” it should take in terms of creativity and answer definition. Lower scores are more logical in their answers and follow a strict pattern, where higher values are more creative and harder to control.
  •  Test different GPT-3 engines. 
  • Optimize your instructions or generation goal to fit your use case, and build any optimization tools that increase completion accuracy.
  • Optimize the prompt examples for your specific use case
  • Your model's effectiveness will change as you add more variance to both the examples used and the listing used at runtime. The more “ground” you want to cover while holding the accuracy, the more variance you’ll need in your few shot examples. Variance can include: 

             - Length of product listing

            - Variance in “type” of products used as examples. If you use only examples of women's scarves and try an iPad listing the model will struggle.

            - Source of product data

            - Contextual similarity in examples to the runtime ecommerce product

Ecommerce Product Taxonomy 

It’s no secret that potential customers being able to find exactly what they want to buy in the least amount of time and clicks is directly correlated to conversion rates. Research shows that only 23% of users try a search more than 2 times before giving up. The focus for product taxonomy has to be structuring our product pages and search results to give results that lead to the best conversion rates. 

Best Practices When Creating A Product Taxonomy Strategy

Accounting for the human element of website navigation and layout is a huge part of building a taxonomy plan that leads to increases in traffic and sales. Let’s take into account what we just learned about creating tags and categories, along with our product listing relative to these newly created attributes.

Two Main Types of Taxonomies & How Our Model Interacts

Taxonomy is normally split into two categories, hierarchical and faceted. Hierarchical taxonomy is the standard tree structure that you think of when you think of how products can be broken down into categories and subcategories. Facets is a structure where the product is broken down into attributes and “facets” of the product. This allows customers to find what they are looking for without knowing the specific name of the product, but the features that make it up. 


Our GPT-3 based model can produce all of these tags for both hierarchical and facets, and allows us to take it a step further with our deterministic outputs we looked at above. 

If we wanted to create the example product hierarchy above and produce tags for it we can by simply adjusting the gpt-3 model we saw before. Our initial version actually produced gender categories, and we can use the same deterministic approach to generalize tags into a clothing category.

We can take these generated results and ask GPT-3 to pick a clothing group based on that information.

Facets is even easier to cover, considering that we are already producing tags that are attributes of the product. If you wanted the model to be more focused on just extracting every keyword attribute you could turn the temperature of the model to 0 (which makes the model more argmax focused). 

Use GPT-3 To Group Our Products For Us

If we already have predefined categories that our products need to fall into, we can use GPT-3 in a different way to cluster similar products. Using the GPT-3 search api we can run a query across all of our products in a database and return them in order based on semantic similarity. There's two main options we have here:

  1. We can use an exact product as the query and rank all the other products based on similarity to the query one. With this route it is normally smarter to actually extract keywords and attributes from the query product first, and then compare those for semantic similarity to the other ones. The summarized version of our query will remove words and information that don’t help us find similar products and might actually push us away.
  2. We set our query to be a search query similar to what we would see. Normally we include a little bit more information to refine the results and create a semantic difference between close products. This allows us to understand how products line up similar to a product hierarchy. 

The search api is tuned in a similar fashion to the model we saw before but the results are very different. 

The documents (in our example these are our products) are returned with a score that says are similar each one is compared to the query. We’re not generating any text, so we don’t get any back like we did before. Most of the optimization and tuning is limited to the hyperparameters and the language we use in the query. Testing accuracy is a pretty straightforward process where we compare the top similarity to what we expected it to be on a test set. 

If we want to further cluster all our products together based on the similarity results of either process we can use a clustering algorithm such as K-Means to group them together. Now we have an understanding of what products are similar based on:

  1. A single given product or key information extracted.
  2. A search query or search filter

Understand How Related Your Product Description Is To A Given Search Result

The goal of product taxonomy is to increase sales and conversion rates. Your product listings need to make sense given what search they come up for. It’ll be hard to get a potential customer to click through to a product if the description showing on the search results is very different then what they had in mind. Related to this idea as well is that depending on how your search engine is set up, you have to make sure that the tags that are assigned to a product make sense for the search result they lead to. Just because you apply a “swimwear” tag to mens swim trunks doesn’t mean you want them to show up when someone searches “bikini swimwear” or “womens swimwear”.

Not only do we want to make sure the optimal results show up, but the correct filters as well.

This also falls into the idea of understanding your target user for a given product and the route they take to get there. You want to understand what type of taxonomy best fits your users and study how they interact with your site. See how they navigate around your site, how much time they spend on reading product descriptions vs attributes, and mine product data around their search bar behaviors.

Make Your Ecommerce Product Taxonomy Systems “Test Focused” With Clear Evaluation Metrics

Product taxonomy requires constant tweaks and optimizations as your potential customers change their behavior and as you change your site infrastructure. Making it easy to test the different pieces of your taxonomy system and create evaluation metrics that directly lead to outcome is the best way to stay on top of your taxonomy and quickly make adjustments. 

We’ve focused on a machine learning focused system to taxonomy so far and the most successful ML products can easily be evaluated and optimized over time to hold a standard of accuracy. Building a test architecture for what we’ve looked at so far isn’t difficult, but does require you to put in the time and effort. With every part of your taxonomy system easily testable you’ll feel much more confident in the scalability and longevity of the powerful machine learning you’ve put in place.

Testing your system is more than just the machine learning models you’ve created to optimize and automate your process. You’ll need to test how changes to your taxonomy structure change conversion rates and interaction metrics among different groups of buyers. Here’s a quick example:

  1. Let’s say we change our categories and attributes hierarchy for womens swimwear items. 
  2. We need to test these changes for different keyword searches and filters.
  3. These changes need to be looked at from how a target buyer using a specific search converts and how easy it is for them to find what they are looking for. If they search for “small womens 1 piece bikini” we need to test and examine how our results are laid out based on our new taxonomy changes. 
  4. Building a test suite to produce these results before these changes go live is always the best option. We don’t want to be just hoping our conversion rates increase with the new changes, it’s too risky if they don’t.

Building a direct and heuristic keyword model is a great way to understand not only how similar two products are, but how similar they are contextually. This model produces heuristic keywords that are contextually similar to the text. The idea is to use those as test searches in our ecommerce store and see if we still reach products that make sense for what we want to buy.

When looking to test our GPT-3 model’s output there are a bunch of different tools we can use based on the route we take. Let’s say we have this given output and we want to compare it to an expected output that we consider correct.

Fuzzy Search

Fuzzy search allows us to compare two strings to each other for similarity and allow for some changes and typos. We can set max substitutions, deletions, insertions, and distance to tweak how similar we want the outputs to be. This is a great way to compare individual tags from our expected output to tags and categories.

SpaCy Lemmatizer

This SpaCy algorithm is one we’ll reference throughout this section. We can transform our produced product categorization keyword/tag into the lemma form of the word to compare to our expected. Words like “are” or “them” are results that if the expected answer has the lemma form of, we probably want to mark as correct. Using this on top of keyword checks and fuzzy searches is a great way to remove false negatives.


Sense2vec is our favorite tool to use for ecommerce product categorization testing, as well as something we use across other NLP domains. This algorithm allows us to query for contextually similar keywords based on an input keyword and a part of speech. Not only does this cover different “versions” of the same word, but contextual similar words that we might consider to be the same. I recommend setting the baseline similarity score pretty high given that product names are already pretty close to begin with. In other domains such as direct and heuristic keyword extraction we normally set the similarity score much lower.


The most similar results are ranked from highest to lowest. My suggestion for ecommerce products is to use a keyword check on the results and only grab ones that contain a keyword from our original. For this example, we would use “wool” and “coat” and grab from here. The more adjectives our keyword contains the more refined we have to be in the results to make sure the results are actually similar to what we searched for. 

If we refine our input keyword down too much there’s a chance sense2vec has never seen that keyword before. Our query isn’t even in the model. Best practices would say to spend time really understanding the average contextual similarity between these different product data points. 

We can also use s2v to compare two different product tags or categories. 

Keep Your Ecommerce Taxonomy Simple and Human

Keeping your structure simple and logical is the best way to ensure not only that your ecommerce taxonomy works, but that changes and optimizations are easy to make. When making decisions on sub-groups of categories always go with broad and shallow over super narrow with few products in each. Let the tags and search engine algorithm work to put the right products in front of the right searches. When it comes to categories, try to keep those as broad as possible with tags being what is used to rank products for various searches. This keeps everything organized, but allows for variance when users search your store. Avoid the “Other” category at all costs, nobody shops in the other category, they just leave your site.

Understand The SEO & Search Engine Implications

Product taxonomy is mostly seen as an internal business management operation focused around product categorization and building hierarchies. However best practices also include an understanding of how these changes and optimizations affect how our products show up in search engines like google. On top of that, we need to be aware of what external search results map to our different pages and what that means for the user experience of those customers. 

Ready To Get Started With Machine Learning?

See how you can implement a GPT-3 model into your ecommerce business and start automating and optimizing tasks such as product categorization and taxonomy. 

Book a free consultation with us for more!