Leverage Ai Based Product Data Classification Software Instantly in 5 Steps

Matt Payne
August 15, 2022
pumice.ai high level architecture from data to PIM

Customers being able to find the exact product they’re looking for when searching your store or searching Google in the fewest steps possible is critical to the conversion rates of your ecommerce brand or marketplace. Customers have an ideal product in mind when they put in a search and rarely are interested in searching hard for it as 47% of users give up searching for a product after one attempt, and only 23% try 3 or more searches.

This is why it's critical that your products are categorized into the proper product categories in your taxonomy tree and product catalog. As changes to your tree happen (adding new categories, expanding new leaves) or as you add new products, updating how these products are categorized is a massive part of making sure buyers have a clear path to what they’re looking for.

Product data categorization is mostly done manually today and is a huge time consuming task that can lead to more and more human error as you scale the size of your tree and the number of products you have. This process becomes more expensive over time as the resources need to have a great understanding of your product data and the perfect fit for each product in available categories which requires better employees.

We’re going to look at how you can use Ai to automatically classify products into their correct categories in 5 easy steps that allow you to keep your products and product taxonomy tree(s) in sync 24/7.

Why Product Data Classification?

Routinely classifying or reclassifying products into their best fit categories is a business process that produces a ton of benefits for ecommerce brands and marketplaces:

- Reduce Bounce Rate 

- Improve internal & external SEO

- Improve internal product data tracking and metrics

- Instantly Increase Conversions

How Organizations Classify Product Data

There are a few common instances where organizations manually work to classify product data into product categories in a taxonomy tree:

- A new product is created and needs to be classified into the correct categories. This requires manually reviewing the product information such as title, description, price, and attributes and deciding what category is the best fit based on the existing products and any data rules. 

- An existing product taxonomy tree has been updated to either have new categories, deeper categories (“Cat Supplies” now has a “Cat Litter Box Mats” leaf), deleted categories, or combined categories. In the case of deleted or combined categories, this usually means the affected products must manually be refit to new categories. In the case of new categories or deeper categories, this requires much more effort as a large percentage of the “upstream” or surrounding products must be refit.

- Existing products need to be refit to an existing product taxonomies. Organizations do this when they feel they’re not getting good conversion rates and bounce rates from search result based buyers. 

Problems With Manual Product Data Classification

Manually adjusting products and product categories to better fit each other comes with a number of downfalls that grow as you scale the number of products and the size of your taxonomy tree. 

Required Human Effort:

Most retail companies still use full teams devoted to manually sorting products into categories and groupings. As the cost of labor continues to increase the effort required for this business process is rising. Another key issue that larger retailers find is the level of business understanding that is required for a manual laborer to correctly classify product data. These manual employees must have a strong understanding of product similarities, product category differentiation, and high level business goals that become affected by categorizing products. 

Costs that scale with your company:

The level of human effort doesn’t change as your products and categories grow as well, making it very difficult to streamline this business process with manual labor. The speed that which humans can classify product data does not increase at a rate that can offset the growth of a taxonomy tree or number of products.


Manually processing product data can lead to a high error rate when talking about some more difficult processes such as adding to trees or removing leaves in a tree.

Can be difficult to conceptualize product comparisons with larger trees:

The underlying relationship between any built product categorization rules and semantic similarities in products that lead to the same categorization can be difficult to understand. This often requires a deep understanding of general data categorization and internal methods of evaluating these parameters.

Using Ai To Automatically Classify Product Data

google product taxonomy in neo4j
Google Product Taxonomy in Neo4j

Machine learning (ML) and artificial intelligence allow us to completely remove any manual labor required to classify product data. These ML algorithms learn the relationship between product data and categories for us and can automatically fit products to the correct spot in a taxonomy tree with 98% accuracy. 

The even larger benefit of automated product classification is the reduction of manual labor. We’ve seen on average these pipelines lead to 17x less work required to perform any of the tasks listed above and become even more efficient as the number of product data and categories grows in size.

Here’s how you can go from raw product data to classified product data in 5 easy steps with Pumice.ai automated product classification.

Automatically Classify Product Data in 5 Easy Steps

product data classification with pumice.ai
Go from raw product data and taxonomy trees to categorized products in your PIM or database

Step 1. Gather Product Data

Gather the various products and their product data that you want to categorize into your product categories or taxonomy tree. Two product data fields, title and description, are required as they contain the bulk of the information needed for the artificial intelligence algorithms. Fields such as attributes, features, price, model etc can either be added as optional fields or appended to product description. 

These can often be exported from your product information management system or internal product database. 

Step 2. Upload CSV or Connect Via API

pumice.ai data upload

Pumice.ai allows you to add your products to software via a CSV upload or a raw connection to the API. When uploading the CSV you’ll be asked to map specific columns to the data fields required. Raw API connection allows you to connect to the artificial intelligence algorithms without using the dashboard to use as a part of an application, cron job, or other useful methods. If you have a custom fine-tuned model the ID must be provided.

Step 3. Select Dynamic API or Generate Categories With Non-Dynamic API

pumice.ai endpoint options

There are two different ways to categorize your product data and both can be used for a variety of use cases. These APIs differ in terms of the inputs they require and the outputs they produce and have different ML algorithms under the hood. 

Dynamic API to fit products to categories

product taxonomy example with google product

Upload your product taxonomy or a list of categories to fit products to

The dynamic API works to take product data and categorize it into a given taxonomy tree or list of categories. These machine learning models can be used on any set of categories and do not require training specific to your categories. This allows you to quickly make changes to how you structure your taxonomy and run products through multiple different trees. The output you’ll get is each product in the CSV or raw API connection fit into a specific category.

Non-Dynamic API to generate categories

The non-dynamic API allows you to generate related categories that fit product data without providing your own tree. These categories are generated on a learned relationship that understands how to map product data to new categories that make the most sense without ever seeing your tree. This API is best used when you are looking to build a new tree and want to understand what artificial intelligence thinks you should name the category, or if you’re trying to compare your current category-to-product relationships to find a better fit that does not exist in your current tree. 

The output from this endpoint is a category keyword as well as some product tags that make sense for this product data.

Bonus Product Similarity Endpoint

product data similarity endpoint in pumice

We’ve deployed a new ai model that allows you to compare the similarity between two products based on the same fields from before. This can be used alongside the key product data endpoints to further understand how related the products in a given category are to each other. 

Step 4. Automatically Fit Your Products To Categories or Generate New Ones

This is the easiest step of them all! Once you’ve uploaded your product data and taxonomy tree it’s a simple one-click to either generate categories & tags (non-dynamic) or fit the products to your tree. Single run endpoints require even fewer steps to generate this same information for a single product record.

product data classification results in pumice
Product categories and product tags are generated based on provided product data. 

Data will come back in the same structure (CSV/JSON) as what is used for the input. Custom integrations are provided with our Enterprise package allow for this data to be returned in a new structure or automatically integrated into PIMs or ecommerce hosting sites. 

Step 5. Integrate New Data Into Product Information Management (PIM) System

architecture diagram to go from raw data to classified data in pumice

Once you’ve generated your new product data matches you can leverage the data by integrating it into various PIM and ecommerce systems manually or through a custom integration. These custom integrations allow you to increase the level of automation you have in this product data fitting business process and remove more manual steps. Our most popular custom integrations are:

- Shopify

- Woocommerce

- Sales Layer PIM

- Plytix PIM

- Google Sheets

- Raw internal database

We use a combination of existing APIs and machine learning models in our custom integrations to actually automate out these business processes past just connecting code.

Automate your product data classification today!

Try our baseline models on Pumice.ai today and see how ai can automate your data classification tasks in a matter of minutes!