Marketing teams sit on enormous amounts of data — store sales, online transactions, catalog orders, promotion history, customer demographics, and product lines. The challenge has never been a lack of data. It has always been the absence of a single, coherent place where all of that data comes together and tells a clear story.
Before this project, the team was operating with fragmented reports. Channel revenue was tracked in one place. Promotion performance in another. Product category profitability required a manual request to the data team, often taking one to two days. And answering something as fundamental as “which promotion drove the highest revenue last quarter” meant joining four tables — a task that most marketing users simply could not do on their own.
We set out to change that. The result was a Marketing Intelligence Dashboard built on Databricks that is live, filterable, and fully accessible to non-technical marketing users.
The Problem We Were Solving
The marketing team had four categories of recurring questions that never had fast answers:
- Channel performance — which of our sales channels is contributing the most to revenue and profit, and what does the trend look like over time by quarter?
- Product performance — what kinds of products, brands, and specific products are generating the highest margin, selling the most units, and generating the most revenue?
- Promotion effectiveness — which promotions actually worked, measured by revenue generated relative to the cost of the discount offered?
- Customer behaviour — how do preferred customers compare to regular customers in terms of basket size, category preference, and revenue contribution?
None of these required exotic data or complex modelling. All of it was already sitting in their Databricks workspace. It simply needed to be structured and surfaced in the right way.
What We Used
- Dataset — TPC-DS SF1. A retail decision support benchmark dataset that comes pre-loaded in every Databricks workspace under
samples.tpcds_sf1. Purpose-built for retail and marketing analytics with multiple sales channels, a four-level product hierarchy, promotion data with channel attribution, and customer segmentation. - Storage — Unity Catalog. Data is stored in
dbw_pixel_demo.marketing_domain. - Gold Layer —
dbw_pixel_demo.marketing_gold. All marketing KPIs are calculated across five SQL views. - Dashboard — Databricks One. Native drag-and-drop dashboard builder.
- AI Layer — Genie. Natural language interface connected directly to all five Gold views.
The Dataset in Detail
Seven tables were loaded into dbw_pixel_demo.marketing_domain from samples.tpcds_sf1:
| Table | Rows | What It Contains |
|---|---|---|
| store_sales | 2,879,789 | In-store transactions — item, customer, promotion, quantity, price, discount, net profit |
| web_sales | 718,931 | Web channel transactions — same structure as store_sales |
| catalog_sales | Loaded separately | Mail-order catalog transactions |
| item | 18,000 | Product catalog — category, class, brand, product name, retail price |
| promotion | 300 | Promotion definitions — name, channel flags (email, TV, radio, press), discount type |
| customer | Loaded separately | Customer demographics — preferred flag, birth country, salutation |
| date_dim | Loaded separately | Date dimension — year, quarter, month, day of week |
What makes TPC-DS particularly valuable for marketing analysis is the depth of its transactional data. The store_sales table includes pre-calculated revenue, discount cost, and profit in every row — meaning profitability does not require a separate cost table. The promotion table, while compact at 300 rows, contains channel attribution flags (email, TV, radio, press) for every promotion, providing the kind of attribution information that marketing departments struggle to build.
How the Architecture Works
Layer 1 — Raw Tables (marketing_domain)
Seven tables were loaded from TPC-DS. These are not queried directly on the dashboard. The joins are complex, the column names are abbreviated, and the raw tables are not suited for end-user consumption. This layer is purely the foundation.
Layer 2 — Gold Views (marketing_gold)
Five SQL views handle all of the analytical heavy lifting:
- marketing_kpi_summary — the headline view. Joins store_sales to date_dim and computes total revenue, total profit, profit margin percentage, transaction count, unique customers, average basket value, and total discounts — all segmented by quarter.
- channel_revenue_summary — the channel comparison view. Uses three CTEs (store, web, catalog), each computing revenue, profit, discounts, and transaction count from their respective sales tables joined to date_dim. The three CTEs are unified so any chart can compare all three channels in a single query.
- item_performance — the product performance view. Joins store_sales to item and date_dim. Calculates revenue, profit, profit margin percentage, units sold, average selling price, and discount amount per product, sliced by category, brand, and quarter.
- promotion_effectiveness — the campaign performance view. Joins store_sales to promotion, item, and date_dim. Computes revenue generated, units sold, transactions, and discount cost per promotion. Preserves channel flags so users can see which media channels were active for any given campaign.
- customer_segment_performance — the customer behaviour view. Joins store_sales to customer, item, and date_dim. Segments by customer preferred flag, birth country, and item category. Computes unique customers, total revenue, revenue per customer, and average basket value.
Layer 3 — Dashboard and Genie
The dashboard reads directly from the Gold views. Genie reads from the same Gold views. The same definitions, the same calculations, everywhere — no divergence between what the charts show and what the AI answers.
What the Dashboard Looks Like
- Row 1 — Filters. Year, Quarter, Channel, and Product Category. All charts react to all four filters at once.
- Row 2 — Six KPI tiles. Total Revenue, Total Profit, Profit Margin Percentage, Total Transactions, Unique Customers, Average Basket Value.
- Row 3 — Channel revenue and profit (grouped bar chart) comparing Store, Web, and Catalog side by side. Channel revenue share (donut chart) showing percentage split and dominance.
- Row 4 — Channel revenue trend over four quarters (line chart) and top 10 revenue-generating products (horizontal bar chart) including margin.
- Row 5 — Revenue and profit by product category (bar chart) and best promotions ranked by revenue (horizontal bar chart).
- Row 6 — Revenue by category and customer type, comparing preferred versus regular customers (grouped bar). Revenue and discount by category showing where promotions are disproportionate to revenue return (grouped bar).
The dashboard is live, and all charts react to filters at once. There is no need to export data, refresh data manually, or wait on the data team.
How We Configured Genie for Marketing
We connected Genie to all five Gold views and designated them as Trusted Assets. We then wrote detailed routing instructions so that Genie understands marketing language and directs every question to the right view.
The routing instructions we provided:
- “Revenue” refers to gross_revenue from channel_revenue_summary.
- “Best promotion” means sort by promo_revenue descending from promotion_effectiveness.
- “Margin” refers to profit_margin_pct from item_performance.
- “Basket size” refers to avg_basket_size from customer_segment_performance.
- Channel questions route to channel_revenue_summary.
- Product and category questions route to item_performance.
- Promotion and campaign questions route to promotion_effectiveness.
- Customer and segment questions route to customer_segment_performance.
- Headline KPI questions route to marketing_kpi_summary.
“What channel brought in the highest revenue last quarter?”
Returns Store, Web, and Catalog, sorted by gross revenue, profit, and transaction count.
“What was the best-performing promotion?”
Shows promotions sorted by revenue with discount cost included, so they can see return on investment for each campaign.
“How do preferred customers differ from regular customers?”
Returns revenue per customer, basket size, and total revenue for both groups.
“Which product category is the most profitable?”
Returns all categories sorted by average profit margin percentage.
Four Things We Learned
1. The Channel Comparison View Changes How Marketing Teams Think About Investment
Prior to this dashboard, the marketing team believed the web channel was their most profitable because of its lower overhead. After viewing the channel comparison view — which displayed profit margin percentage and gross revenue across all three channels simultaneously — they discovered that the catalog channel carried a statistically significant higher margin than web, despite generating lower total revenue. That insight came from a chart that took approximately five minutes to build.
2. The Promotion Table Is More Powerful Than It Appears
The TPC-DS promotion table contains only 300 rows, but it holds channel attribution flags for every promotion. By joining it to store_sales using ss_promo_sk, we produced revenue-per-promotion data segmented by which media channels the promotion ran across. This is a straightforward multi-touch attribution model built entirely from structured SQL joins — exactly the kind of analysis that marketing teams regularly struggle to produce.
3. Marketing Users Ask More Specific Questions Than Any Other Business Function
Finance users tend to ask broad questions. Sales users ask relative questions. Marketing users ask highly specific questions like “which promotion generated the most units sold in the Home category in Q3 last year?” The Genie routing instructions for this space had to be considerably more detailed than other departments. Precision in the instructions translated directly into precision in the answers.
4. Gold Views Are the Key to a Consistent Experience
Having the dashboard and Genie both read from the same Gold views was one of the most important architectural decisions we made. This meant that the number in the chart and the number from the natural language query were the same. There was never any doubt, never any disagreement, and never any need to reconcile. Trust in the whole solution comes from having consistent data.
How to Build This Yourself
- Load the tables. In the SQL Editor, run
CREATE TABLE AS SELECT * FROM samples.tpcds_sf1for all seven tables into your marketing_domain schema. - Create the Gold schema.
CREATE SCHEMA IF NOT EXISTS dbw_pixel_demo.marketing_gold. - Build the five Gold views. marketing_kpi_summary, channel_revenue_summary, item_performance, promotion_effectiveness, customer_segment_performance.
- Build the dashboard. Create a new Databricks AI/BI dashboard and add all the Gold views as datasets. Build the widgets as described above.
- Configure Genie. Add all five Gold views as Trusted Assets and paste in the business routing instructions. Add sample questions and test thoroughly before publishing.
- Publish. Set the auto-refresh to hourly and distribute the URL to the marketing team.
This process will take around two to three hours to complete, mostly in building the five Gold views.
What Changed for the Marketing Team
Before this dashboard existed, answering “which promotion should we run again next quarter?” required a data analyst to join four tables, aggregate the results, and return a spreadsheet. That process took one day. Sometimes two.
After the dashboard was live, a marketing manager could sort the promotion effectiveness chart by revenue and have that answer in under ten seconds. If they wanted to understand why a particular promotion performed the way it did, they asked Genie and received a breakdown by product line and discount level instantly.
The shift was not just about speed, though speed mattered. It was about independence. Marketing managers stopped waiting for data and started making decisions on their own timeline, using information they trusted, in a format they could act on.
Final Thought
The architecture we built here — structured domain tables, purpose-built Gold views, a live filterable dashboard, and a natural language AI layer connected to consistent definitions — is a pattern that works well beyond this specific use case. It applies anywhere a team has rich data that is currently inaccessible to the people who need it most.
The technical stack is straightforward. The SQL is standard. The configuration is within reach of any data team working in Databricks. What makes it powerful is not the complexity of the tooling — it is the intentionality of the design: understanding exactly what questions the end users are asking, building the data layer to answer those questions precisely, and surfacing the answers in a format that requires no technical knowledge to use.
We have done this for marketing. We are ready to do it for you.