The marketing team had data. They had plenty of it. Store sales, online sales, catalog orders, promotion history, demographics of customers, product line. What they lacked was one place where all that fell together and became a clear picture.
One report was used to track channel revenue. Another one monitored promotion performance. Profitability of product category was calculated on request by the data team and typically required a day or two. And the query “what was the promotion that actually generated the highest amount of revenue last quarter” entailed combining four tables in a manner that most marketing users could not do it themselves.
We constructed a Marketing Intelligence Dashboard on Databricks, as part of the answer to all of these questions, live, filtered, and accessible to any non-technical marketing user, using a free retail dataset named TPC-DS, which is present in every Databricks workspace.
The Problem We Were Solving
There were four categories of questions that the marketing team would require answers to on a quarterly basis.
- Channel performance — what are the two or three of our sales channels (store, web, catalog) contributing the most revenue and profit and what is the trend of each over time.
- Product performance — which types, brands and individual products have the most margin and sell the most as well as the highest revenue.
- Promotion effectiveness — effectiveness of run-off promotions based on revenue and cost of discount promotion.
- Customer behaviour — what are the differences between preferred customers and regular customers in terms of sizes of their baskets, category preference and revenue per customer.
None of these entailed exotic data. It was all in their Databricks workspace in its TPC-DS sample dataset.
What We Used
- Dataset — TPC-DS SF1. Free retail decision support benchmark data constructed into each Databricks workspace under
samples.tpcds_sf1. - Storage — Unity Catalog. Data is stored in
dbw_pixel_demo.marketing_domain. - Gold Layer —
dbw_pixel_demo.marketing_gold. All marketing KPIs are calculated using 5 SQL views. - Dashboard — Databricks AI/BI. Drag-and-drop builder of dashboard within Databricks One.
- AI Layer — Genie. Natural language interface attached to the Gold views.
The Dataset in Detail
Seven tables loaded into dbw_pixel_demo.marketing_domain from samples.tpcds_sf1:
| Table | Rows | What it contains |
|---|---|---|
| store_sales | 2,879,789 | In-store transactions — item, customer, promotion, quantity, price, discount, net profit |
| web_sales | 718,931 | Web channel transactions — same structure as store_sales |
| catalog_sales | loaded separately | Mail-order catalog transactions |
| item | 18,000 | Product catalog — category, class, brand, product name, retail price |
| promotion | 300 | Promotion definitions — name, channel flags (email, TV, radio, press), discount type |
| customer | loaded separately | Customer demographics — preferred flag, birth country, salutation |
| date_dim | loaded separately | Date dimension — year, quarter, month, day of week for joining all sales tables |
TPC-DS has been designed to serve the workloads of decision support and retail analytics. Unlike TPC-H, it models a multi-channel retail company with a decent sales channel division (store, web, catalog), a four-level product hierarchy (category > class > brand > item), promotion information with actual channel allocation, and customer segmentation with preferred customer indicators.
The store_sales table is particularly profitable for marketing analysis. It contains ss_quantity, ss_sales_price, ss_ext_discount_amount, and ss_net_profit columns — meaning revenue, discount cost and profit are all calculated in advance in the raw data. The promotion table has flags indicating the use of each marketing channel (p_channel_email, p_channel_tv, p_channel_radio, p_channel_press), providing the kind of attribution information that marketing departments really need.
How the Architecture Works
Layer 1 — Raw Tables (marketing_domain)
Seven tables were loaded out of TPC-DS. These are not directly queried on the marketing dashboard — the joins are complicated, the column names are short, and the raw tables are not end-user friendly.
Layer 2 — Gold Views (marketing_gold)
All the heavy lifting is done by five SQL views:
- marketing_kpi_summary — the main page. Joining store_sales to date_dim. Computes total store revenue, total store profit, profit margin %, total transactions, unique customers, average basket value, total discounts — per quarter.
- channel_revenue_summary — the channel comparison view. Uses three CTEs (store, web, catalog), each computing revenue, profit, discounts and transaction count from their respective sales tables joined to date_dim.
- item_performance — the product performance view. Joins store_sales with item and date dimensions. Computes revenue, profit, profit margin %, units sold, average selling price, and discount amount per product, per category, brand, and quarter.
- promotion_effectiveness — the campaign performance view. Joins store_sales to promotion, item and date dimensions. Computes revenue generated, units sold, transactions, and discount cost per promotion. Saves channel flags to show what media channels were used.
- customer_segment_performance — the customer behaviour view. Joins store_sales with customer, item, and date_dim. Segments by customer preferred flag, country of birth and item category. Computes unique customers, total revenue, revenue per customer, and average basket.
Layer 3 — Dashboard and Genie
Gold views are read directly on the dashboard. Genie reads from the same Gold views. Same meanings, same definitions, everywhere.
The Dashboard Layout
- Row 1 — Filters. Year, Quarter, Channel, Product Category. Each chart reacts to all the filters at the same time.
- Row 2 — Six KPI tiles. Total Revenue, Total Profit, Profit Margin %, Total Transactions, Unique Customers, Average Basket Value.
- Row 3 — Channel revenues and profit (grouped bar) + Channel revenue share (donut).
- Row 4 — Channel revenue trend, quarterly (line) + Top 10 revenue products (horizontal bar).
- Row 5 — Revenue and profit per product category (bar) + Best promotions by revenue (horizontal bar).
- Row 6 — Revenue by category and customer type (grouped bar) + Revenue and discount by category (grouped bar).
The whole dashboard is live. All charts are updated by filters at the same time. No export needed.
How Genie Works for Marketing
We linked Genie with all five Gold views, labeled them as Trusted Assets, and gave instructions that guide Genie on how to make sense of marketing lingo.
Key routing instructions:
- “Revenue” refers to gross revenue from channel_revenue_summary.
- “Best promotion” means sort by promo_revenue descending from promotion_effectiveness.
- “Margin” refers to profit_margin_percentage from item_performance.
- “Basket size” refers to avg_basket_size from customer_segment_performance.
- Channel questions always go to channel_revenue_summary.
- Product/category questions always go to item_performance.
- Promotion/campaign questions always go to promotion_effectiveness.
- Customer/segment questions always go to customer_segment_performance.
- Headline KPI questions go to marketing_kpi_summary.
The outcome is precise, well-routed answers to the queries that are really posed by marketing users:
“What channel brought in the highest amount of revenue during the last quarter?”
Returns Store, Web and Catalog ranked by gross revenue, profit and number of transactions.
“What was the best promotion in terms of revenues?”
Shows promotions giving highest revenue with the cost of the discount displayed so users can assess ROI.
“How do preferred customers differ from regular customers?”
Revenue per customer, basket size and total revenue for both groups.
“What product category is the most profitable?”
All categories sorted by average profit margin percentage.
Four Things We Learned
1. TPC-DS is drastically underutilised as demo and prototyping data
Most Databricks users know TPC-H from SQL benchmarks. TPC-DS is also built into every workspace and is much more suited to retail and marketing analytics. If you are building a marketing, retail, or e-commerce analytics proof of concept — TPC-DS is a great place to start. It will save you days.
2. The channel comparison chart changes how marketing teams think about investment
The marketing team had thought that the web channel was their most lucrative. After viewing the channel comparison view that displayed profit margin % and gross revenue across all three channels, they discovered that the catalog channel had a significantly higher margin than web despite lower revenue. That insight came from a chart that took five minutes to build.
3. The promotion table is more powerful than it seems
The TPC-DS promotion table is small (300 rows), but it has channel attribution flags that most marketing teams crave. By joining it to sales transactions via store_sales.ss_promo_sk you get revenue per promotion by which media channel the promotion used. It is a simple yet actual multi-touch attribution model built on free sample data.
4. No other team poses any more specific questions than marketing users
Finance users ask broad questions. Sales users pose relative queries. Marketing users pose surgical questions like “what was the most units in the Home category promoted in Q3 last year?” The Genie guidelines for Marketing had to be a lot more detailed and routing precision had to be much higher.
How to Build This Yourself
- Load the tables. In SQL Editor, execute
CREATE TABLE AS SELECTfromsamples.tpcds_sf1intodbw_pixel_demo.marketing_domainfor all seven tables. - Create the Gold schema.
CREATE SCHEMA IF NOT EXISTS dbw_pixel_demo.marketing_gold. - Execute the five Gold views. marketing_kpi_summary, channel_revenue_summary, item_performance, promotion_effectiveness, customer_segment_performance.
- Build the dashboard. Databricks One → New Dashboard → add all Gold views as datasets → create widgets.
- Configure Genie. Add all five views as Trusted Assets → paste business instructions → add ten sample questions → test before publication.
- Publish. Auto-refresh hourly, add share URL to the marketing team.
Overall setup time: about two to three hours, primarily writing and testing five Gold views.
What Changed for the Marketing Team
Before this dashboard, a question like “what promotion should be run again next quarter?” had to be handled by a data analyst joining promotion, store_sales, and item tables, aggregating findings, and returning a spreadsheet. That took a day. Sometimes two.
After the dashboard, a promotion effectiveness chart appears, and after the marketing manager sorts by revenue, they get the answer in ten seconds. If they are interested in why a certain promotion worked, they query Genie and receive a breakdown by product line and discount price.
The shift was not just speed. It was independence. Marketing managers did not wait to receive data but made their own choices on their own time.
Final Thought
TPC-DS is free data that represents a practical retail business with three sale channels, product hierarchy, customer segmentation and promotion attribution. It is found in all Databricks workspaces. The marketing Gold views we created over it are standard SQL with no exotic functions. The dashboard is built with clicks and configuration.
This is worth building if your marketing team is still waiting for reports on promotion performance, channel revenue, or product category.
The question is never “do we have the data?” The question is always “have we made the data available?”