Databricks recommendations for online retail

In short

The outcome we're after.

An online retailer already owns the signal that drives good recommendations. Every product viewed, every cart built, every order placed. Most of it sits in Shopify doing nothing for the next shopper. A recommendation and segmentation engine on Databricks turns that history into the personalised merchandising the store needs, trained on the retailer's own data rather than rented from a black-box plug-in. The store owns the model, sees why it suggests what it suggests, and tunes it for margin and the long tail rather than just pushing best-sellers.

Book a discovery call

A laptop and a full shopping cart on a desk, representing the browse and purchase data an online retailer collects.

Databricks

primary technology

The recommendations an online retailer can’t tune

An online retailer already sits on the data that drives good recommendations. Every product a shopper views, every cart they build and abandon, every order they place. The store collects it on every visit, and in most cases it goes straight into Shopify and stops there, doing nothing for the next person who lands on the same product page.

The usual fix is a recommendation app from the store’s plug-in marketplace. It installs in an afternoon and puts a “you might also like” strip on the page, which feels like progress. Then the limits show. The store cannot see why it suggests what it suggests, cannot push higher-margin lines or clear ageing stock, and cannot stop it doing the one thing a lazy recommender always does, which is recommend the best-sellers people were going to buy anyway. The shopper data feeding it also leaves the store’s control, which matters under the Privacy Act 1988 and the rising expectation that personalisation is transparent.

The deeper problem is ownership. A black-box app rents the retailer a result without the model, the features or the ability to tune any of it. For a store whose catalogue, margins and customers are its own, that is the wrong trade. The signal is theirs. The model that learns from it should be theirs too.

Why Databricks, and owning your own model

The aim is a recommendation and segmentation engine the retailer owns and tunes, trained on its own browse and purchase data rather than a generic plug-in. We headline these builds on Databricks for three practical reasons that a one-click app cannot match.

First, the lakehouse holds the raw signal and the engineered features in one place, so the same browse and purchase history feeds recommendations today and forecasting or segmentation tomorrow. Second, feature engineering and model training sit where the data lives, so the store builds features that matter to it, such as margin, brand affinity and category mix, rather than whatever a vendor decided to expose. Third, model serving runs from the same platform, so a trained model returns suggestions to the storefront under the store’s own control.

The supporting stack feeds that core. Shopify is the source of truth for the catalogue, the orders and the on-site behaviour. We model that data through Snowflake into a clean layer of customers, products and events, then land it in the lakehouse for feature engineering and training on Databricks. Recommendations and segments flow back to the storefront, so the shopper sees personalised merchandising and the retention team sees who sits in which segment.

We separate the engine from the storefront on purpose. The store keeps shipping on Shopify while the model improves behind it, and the same modelled data can drive email, merchandising and planning, not just one strip on a product page.

A young couple looking at a tablet together, the kind of personalised product recommendation a Databricks engine serves to the right shopper

Building it, and where it got hard

The model maths is rarely the hard part in retail recommendations. The trap we see is a recommender that scores well in testing and changes nothing at the till. Two related problems cause it, and the first build hit both.

The first was cold-start and popularity bias together. A new product has no purchase history, so a model trained only on past orders never learns to recommend it, and a brand-new visitor has no behaviour to personalise from. Left alone, the model retreats to the safe bet and pushes the same best-sellers to everyone. That looks busy and adds almost nothing, because those are the lines people would have found anyway, while the long tail that could lift basket size never gets surfaced. The fix was content-based features so a new product is matched on category, brand, price band and description rather than history, segment-level popularity as a sensible fallback for a new shopper, and deliberate exploration so fresh and long-tail items stay in rotation long enough to earn their own data.

The second was measurement, and it was the more uncomfortable one. Early offline metrics looked strong. Click-through on the recommendation strip rose. Revenue did not move. The model had learnt to predict what shoppers already intended, not to influence what they bought. We stopped trusting offline click metrics and moved to a holdout. A slice of traffic saw the personalised recommendations and a comparable slice did not, and we measured the difference in basket size, conversion on recommended items and revenue per session. That changed how we tuned the model, because now it had to beat a control rather than a benchmark. Segmentation helped here too. Grouping shoppers by behaviour meant recommendations could fit the segment rather than treat every visitor the same, which is where the measurable lift came from.

Throughout, shopper data stayed governed. Personal data is handled under the Privacy Act 1988, recommendations are explainable because the features are the store’s own, and the store can tell shoppers plainly how personalisation works rather than pointing at a vendor’s black box.

What changed

In a representative build the personalised “you might also like” placements lifted average basket size by a high single-digit percentage, measured against a held-out control rather than against offline clicks. Adding exploration and content-based features pulled products from outside the top sellers into view, so a wider slice of the catalogue earned attention and sales instead of the same dozen lines. Segmenting shoppers by behaviour let the store tune recommendations and merchandising per group, rather than serving every visitor the same homepage.

These figures are illustrative. They describe the pattern we see rather than a published result for a named retailer. The shape is the point. The store’s own browse and purchase data, which was sitting idle in Shopify, starts working for the next shopper, and the retailer owns the model that makes it happen, sees why it recommends what it does, and tunes it for margin and the long tail rather than renting a result it cannot see into.

Where this fits

A recommendation and segmentation engine is one application of our Artificial Intelligence service, built on Databricks, for retail and ecommerce. It is a contained, high-return first build, because the data already exists in your store and the value comes from modelling it properly and measuring real uplift. It is a sensible step up from a generic recommendation app, and the same data layer extends to forecasting, email and merchandising later. If your store data is sitting idle and a plug-in has stopped paying off, the place to start is to map your Shopify browse and purchase data and decide the recommendations worth owning.

Illustrative figures, not a published result

Representative outcomes

Larger baskets

Personalised "you might also like" placements lifted average basket size by a high single-digit percentage in a representative build, measured against a held-out control rather than offline clicks.

Long-tail discovery

Adding exploration and content-based features surfaced products outside the top sellers, so a wider slice of the catalogue earned views and sales instead of the same dozen lines.

Segments that fit

Grouping shoppers by behaviour let the store tune recommendations and merchandising per segment, rather than showing every visitor the same homepage.

Where this fits

This solution applies our Artificial Intelligence service, built primarily on Databricks , for the Retail & Ecommerce sector.

Supporting stack: Snowflake, Shopify.

Go deeper: Artificial Intelligence for Retail & Ecommerce .

Frequently asked.

What are the use cases of machine learning in retail?

The high-value ones cluster around the customer and the catalogue. Product recommendations and "you might also like" placements, customer segmentation for targeted merchandising and email, demand forecasting and inventory planning, dynamic pricing, fraud and returns detection, and search ranking. A recommendation and segmentation engine is usually the best first build, because the data already exists in your store and the uplift shows up directly in basket size and conversion.

How can AI be used in ecommerce?

Mostly to personalise and to predict. Personalisation means recommendations, tailored merchandising and segment-specific email rather than one homepage for everyone. Prediction means forecasting demand, flagging at-risk customers and ranking search results. Generative AI adds product copy, image tagging and shopper-facing assistants. The pattern that pays off first is recommendations trained on your own browse and purchase data, because you own the signal and the uplift is measurable.

Why build on Databricks instead of a Shopify recommendation app?

A plug-in app is quick to install and fine to start with, but you rent a black box. You cannot tune it for margin, you cannot see why it recommends what it does, and your shopper data leaves your control. A Databricks build trains on your own browse and purchase history, so you own the model and the features, can favour higher-margin or long-tail lines, and can extend the same data layer to forecasting and segmentation later. The trade-off is that it is an engineering project, not a one-click install, so it suits stores past the point where a generic app has stopped paying off.

How do you handle new products and new shoppers with no history?

This is the cold-start problem, and it is the main reason a naive recommender disappoints. For a new product with no purchase history, we use content-based features such as category, brand, price band and description so it can still be matched to the right shopper. For a new visitor, we fall back to popularity within their segment and what is in the current session, then personalise as soon as behaviour gives us signal. Exploration keeps fresh and long-tail items in rotation so they get a chance to earn data.

How do you measure whether recommendations actually work?

By revenue uplift against a held-out control, not by clicks. Offline metrics and click-through can look strong while doing nothing for the bottom line, often because the model just learnt to push best-sellers people would have bought anyway. We run A-B or holdout tests so a slice of traffic sees the recommendations and a comparable slice does not, then measure the difference in basket size, conversion on recommended items and revenue per session. Shopper data is handled under the Privacy Act 1988, with transparency about how personalisation works.

Recommendations you own

Turn your store's own data into bigger baskets

We will map your Shopify browse and purchase data and show you the recommendation and segmentation engine Databricks can build on it, measured by real uplift.

Book a discovery call

The recommendations that lift basket size, a Databricks engine for an online retailer

The outcome we're after.

The recommendations an online retailer can’t tune

Why Databricks, and owning your own model

Building it, and where it got hard

What changed

Where this fits

Representative outcomes

Larger baskets

Long-tail discovery

Segments that fit

Related solutions.

A clean Shopify rebuild wired to Stripe and Xero for a retail brand

How an online store answers where's my order in seconds with Zendesk AI

Predicting breakdowns before they halt the line with AWS IoT and ML

Frequently asked.

Turn your store's own data into bigger baskets