Software developers working in e-commerce have several options for storing product data. Whether you choose to build a solution in-house, use a third-party solution, or a hybrid approach, one of the first things you need to establish is your product information manager (abbreviated PIM).
In this post - the first in a three-part series - I’ll show you the core components of a robust product information management system and walk you through the technical decisions you need to make if you decide to build your own. Along the way, I’ll offer some things you should look for if you choose to use a third-party vendor instead. By the end of this series, you’ll have the information you need to decide whether you really need to build your own PIM or not.
A PIM is responsible for storing, controlling, and managing the data you need to market and sell products from an e-commerce store. This covers a lot of ground, so I’ll break it down into three core responsibilities.
First, your PIM will store product data and assets. In this article, I’ll expand on the types of data this might include, but in short, this means everything you need to power your product catalog. Structured data stored in a SQL database, text descriptions and translations stored in a CMS, images stored on a CDN, and taxonomies to help group your products might all be managed by your PIM.
Second, your PIM should allow your team members to create and update data. In addition to storing your product information, your PIM must have an interface that allows your team members to create, import, and update product data. I’ll expand on this topic in Part 2 of this series.
Finally, your PIM is responsible for making product data available. Whether you sell through a web application, use a third-party marketplace (like Amazon or eBay), or support a native mobile shopping experience, your PIM provides the interface that ensures product data is available to buyers or resellers. I’ll go into more detail about this topic in Part 3.
From a software engineering perspective, one of the hardest parts about running a scalable e-commerce store is ensuring that product data is always correct and available to the right people. This is essentially the role of a product information management system. While your product data may originate in a variety of databases and CMSes, the PIM should bring it all together and act as your central source of truth.
As a developer, you can either build a PIM from scratch or pay for one provided by a third-party like Fabric. In the remainder of this post, I’ll offer some of the technical tradeoffs you face when choosing between these two options.
When I start creating specifications for a new software project, I typically start with the data model. Creating a working model often gives me some insight into the complexity involved, and it helps me identify edge cases that might trip developers up later in the development process.
A PIM can be an extremely complex piece of software, but let’s start by looking at a simple product data model.
This model includes product names, descriptions, categories, attributes stored as a table of related keys and values, and links to supporting images for each product. This data model doesn’t look too bad, and if this is as complex as your product database needs to be, you might be able to build this pretty quickly.
Unfortunately, this data model leaves a lot to be desired. What if you want to support multiple variants for each product? What if your products are really “product families” with child-parent relationships? What about tags or custom taxonomies? How about a flat table that acts as a search index?
A data model that supports some of these features might look like this:
You can start to see the complexity inherent in building a robust product data model. Each of the relationships between database models adds development overhead and more edge cases to consider.
The alternative to building your own data model is to use a third-party PIM that supports these features and more. For example, Fabric’s Product Information Manager supports product families, variants, attributes, categories, and collections. This makes managing thousands of products relatively simple without requiring custom code.
In addition to structured product data stored in your database, you’ll also need to support rich media like images, videos, and downloadable PDFs. If you build your own PIM, you’ll need to figure out how to store these files and ensure they’re optimized for end-users.
Cloud file storage options like Amazon S3 and Azure Files have made storing static assets in the cloud much more straightforward. These platforms provide automatic backups, an upload API, and customizable access rules to ensure the integrity of your files.
That said, they still require you to write code to handle the uploads, check file types, resize images, and store a pointer to the image in the database. You also have to figure out what to do with old files (remove them or move them to long-term storage like Glacier) to avoid paying a growing hosting bill every month.
Speed matters a lot in e-commerce, and hosting all your images in a single region is going to hurt your PIM’s performance. Once you’ve figured out how to store media and attach it to your product database, you’ll need to figure out how to cache it so that users around the world get fast response times when accessing your media.
The most common way to speed up image downloads is to use a CDN (Content Delivery Network). Even with a CDN, large files may slow down your frontend, so a robust PIM should store multiple sizes of each image. Smaller images can be served as thumbnails and larger ones on product detail pages.
You’d rarely need to build your own CDN, but integrating one still requires some setup and configuration. By using Fabric’s PIM that is powered by a CDN, you can let them worry about image optimization and caching.
Again, speed matters in e-commerce, so you want your PIM to be fast. For typical product lookups where the object’s ID is in the URL, your relational database will work fine, but what if users don’t know exactly which product they’re looking for? What if you need to support fuzzy searches across thousands of products?
It turns out that full-text search is pretty complicated, especially in a relational database. If you happen to be using Postgres, you could join your product data into a single materialized view and then add a full-text search index as described here. This doesn’t address misspellings or related words, but it’s a start.
Because full-text search in SQL databases is pretty limited, most developers will need to add a purpose-built search index. Elastic search and Algolia are both excellent options as they’re designed to add full-text, fuzzy search to your data. Unfortunately, once you add a search index to your PIM, you’ll have to keep it in sync with your product database. Having duplicate data always adds a new potential point of failure, so keep this in mind as your product catalog grows.
If you use Fabric as your headless e-commerce solution, you won’t have to worry about search speed. They use Algolia to power product search, which gives you a fast, secure, distributed search experience in your online store.
If you design your e-commerce store exclusively for an English-speaking audience, you’ll miss out on a considerable portion of the global market. Internationalization is the process of making sure your product descriptions, titles, specifications, and measurements work across cultural and language boundaries, and it makes building a PIM significantly more complicated.
There are a few ways to handle internationalization in a PIM. The simplest is to add a column to your database for each language you support. For example, to add Spanish to the product data model above, you might add a new column for the Spanish title and another for the Spanish description:
This won’t scale well, though. If you wanted to add support for ten more languages, you’d have to add 20 columns - and that’s just for the products table.
A more robust model for internationalization requires you to add a translation and language table. This adds a little overhead to your database queries as you’ll have to join translations to your products depending on the language requested, but it allows your PIM to serve users in any country. Here’s an example of this pattern on the simplified product table above:
Finally, as any experienced software developer will tell you, no project is ever really finished. By the time you release the first iteration of your PIM, you’ll likely have a backlog of new features and fixes to start working on. Along the way, you’ll learn that some of your assumptions were wrong, and you’re going to have to refactor your data model to support your growing product catalog.
Don’t underestimate the maintenance costs of custom software. One of the most compelling reasons to choose a third-party vendor for your PIM is predictable costs, fewer bugs, and ongoing support.
In general, building a PIM from scratch offers you the most flexibility in your data model, but as you’ve seen, it’s going to be a lot of work. Choosing a third-party provider with a scalable data and file distribution system in place will save you a lot of development and maintenance time while offering a predictable cost structure.
If you’re looking for a robust product information manager for your e-commerce store, check out Fabric. In addition to a scalable PIM, Fabric offers a complete range of headless solutions for managing your e-commerce store.
In the next post in this series, we’ll talk about the interface your PIM must provide to your internal team. This will allow your coworkers to create, import, and update the product data you modeled here.