Skip to main content

Understanding Consumer Behavior: Market Basket Analysis

By June 29, 2020July 10th, 2023Insights
Published Date: Monday, Jun 29, 2020
Last Updated on: Monday, Jul 10, 2023
woman shopping for clothes with a mask on, pandemic

What’s in Your Basket?

You’re finally upgrading to a top-of-the-line smartphone. It has the features you desire, and you’re ready to pay for it. “Wait a minute,” the sales associate says. “You might want to consider an extended warranty. “Good idea,” you agree. “And there are those new, designer phone holsters you may also want to add.” “Good idea,” you again approve. “And, of course, there’s the new powerful car charger you must have.” “Ok,” you say. “I think that makes a lot of sense.”

These typical everyday shopping experiences can provide the marketer with essential information. They can support the marketer to better recognize consumer behavior and needs. Knowing this, they can provide appropriate products, and meet customer expectations.

In the analytics space, we call these retail shopping occurrences market baskets. They consist of combinations of products and/or services that a consumer may purchase together. In the example above, the warranty, the holster and the charger go with the phone into the basket. In a restaurant, a cup of coffee might complement the pastry. And in an automobile service station, a tire rotation may be part of the same basket as an oil change.

Market basket analysis studies combinations of bought-together items to provide the right products to the right customer at the right time. The key is identifying the purchase patterns, determining those trends that are of value and presenting those products in an easily accessible fashion. These affinities can be used to enhance efficiencies and bottom-line profitability. Add-on recommendations, cross-selling, couponing and promotions provide further opportunities to increase revenue.

An early example of this is the legend of beer and diapers, widely reported some twenty years ago. In 1998, Forbes magazine summarized as follows: “A retail chain put all its checkout-counter data into a giant digital warehouse and set the disk drives spinning. Out popped a most unexpected correlation: sales of diapers and beer. Evidently, young fathers would make a late-night run to the store to pick up Pampers and get some Bud Light while they were there. Capitalizing on the discovery, the store placed the disparate items together. Sales zoomed.”

Thus, we witness yet another application of market basket analysis: placement of products within a retail, or for that matter, digital platform.

And the underlying thesis in the analysis is that if we can identify clusters of products that are bought together, we can leverage that information by offering other items that are associated with what was just purchased.

This analysis is often employed by e-commerce sites to offer purchase suggestions to consumers. For example, when a person buys toothpaste, the retailer may suggest other products such as a toothbrush, mouthwash, flossing supplies, etc. This is due to the prevalence with which other consumers bought these items in the same transaction as the toothpaste.

Sounds logical, doesn’t it? Yet it’s not always straightforward. For large firms, the sheer number of combinations can be overwhelming — easily rising into the billions, creating an analytic nightmare. However, there are tools and approaches that make market basket more manageable and useful.

Some researchers use association analysis to search through the various product transaction combinations. Central to this approach are two critical questions that must be raised in examining market baskets. These are:

  • What percentage of the time are products X and Y found in the same market basket?
  • Given that a consumer purchases product X, what is the probability that he will also purchase product Y?

Let’s try to clarify these issues, and then provide the nomenclature that is typically employed to refer to each of these concepts. Look at the chart below, which represents eight transactions emerging from a bookseller’s data warehouse. They could be a brick-and-mortar retailer or an e-commerce one. It lists the baskets of eight customers, with a maximum of five items (books) purchased for each of these individuals.

Transactions at a bookseller

Customer IDFirst BookSecond BookThird BookForth BookFifth Book

We can now attempt to answer the questions we posed above.

  • The Mystery genre appears 25% of the time in the same basket as Drama. There is a total of eight transactions above. Customers 103 and 619 both placed these items in their respective baskets. These two customers represent ¼ of the eight studied individuals, for a 25% frequency.
  • Given the number of times Mystery is purchased, there is a 2/3 chance that they will also obtain Drama. Three individuals bought Mystery — 103, 619 and 901. However, only two of them, 103 and 619 had that particular combination. These two, as a proportion of the three that bought Mystery, provide for .66 probability.

Of course, the same approach can be used to address all the combinations that are of interest.

The first paragraph, the frequency of the occurrence, is designated as the “support”. The latter paragraph describes what researchers refer to as the “confidence”.

Managers employ a variety of metrics to gauge the strength of any discovered relationships. For example, assume that, after examining our bookseller’s transactions, we notice that 6% of all the baskets contain Technology as a purchased genre. Marketers are most interested in presenting this category of book to their audience. What should they do? Whom shall they target? Upon further data analysis, they observe that 18% of those who have bought History also purchased Technology. Wow! That’s a 300% improvement (18% / 6%). This 300% calculation is referred to as “lift.”

The analysis also shows that among those that had Fiction in their baskets, only 4% purchased Technology. This represents a 66% lift (4% / 6%). Clearly, the retailer would consider using History as a trigger to motivate Technology purchases, rather than Fiction. Below is a visual summary.

TechnologyTechnology and History TechnologyTechnology and Fiction 

“So, what?” you might ask. You’re showing me two products. It doesn’t appear to be that complicated. Analysts are evaluating multiple-product baskets. Referring to the transaction record above, one who purchases Health andDrama categories has a 12.5% probability of adding History to his basket. Only customer 277 fulfills this rule (one basket out of eight is 12.5%).

The fact is that market baskets may well differ by age, gender, occupation, etc. This doesn’t present any problem — all you need to do is incorporate it into the analysis. I have often found that this extra level of analysis provides a more insightful view of the customer journey. If we want to take this one step further, we can add yet another dimension: the time element.

Let’s return to our illustration. We briefly discussed the relationships of the various products in the baskets. Recall that we concluded that, for example, Mystery was linked with Drama. A reasonable issue to examine is which one arrived in the market basket first. Was it Mystery or was it Drama? The purchase sequence can influence the tactics employed by the marketer. If many or all customers buy Drama before Mystery, that may provide the retailer with guidance about selling the subsequent product. Various cross-selling strategies can be implemented.

To address the above sequence issue, we can incorporate a time factor into the data analysis. An ensuing conclusion may be the following: Customers who have bought a Mystery novel are 4 times as likely to buy an audiobook reader within 3 months.

Sequence rules, however, are not always that clear. Sequence analysis provides a means to identify such rules, no matter how obscure they may be in your data.

Market-basket type analysis can be extended to a host of different settings. Think of a physician confronted with a patient with multiple health issues. These health issues usually precede a more serious condition. Thus, the basket defined by the initial health problems may often be combined with the more serious event, enabling the physician to proactively treat the patient. Focusing in on which of these events occur together may assist in discerning patterns and correlations between individuals and a host of diverse actions.

So, the beer and diapers association from earlier tells us something about that consumer. A young adult purchasing the latest Xbox console, combined with a new subscription to Sports Illustrated, informs us a bit about his tendencies. A senior citizen having just paid for a health magazine, coupled with an overseas vacation, helps describe him, as well. Want to know what your preferences are? Just check what’s in your basket.

This article was also published on Machine Learning Times, where Sam Koslowsky is a regular contributor.