The problem that Operations teams are concerned with is not the same problem as Finance or Sales or Marketing. Their information is not about revenue or customers. It is about systems — how smoothly things are moving, where demand is focusing, and whether the equipment keeping everything going is in good health.
Our operations team had two totally different things under their watch. On one side: taxi trip data — volume, fare efficiency, zonal and hourly demand. On the other: an IoT sensor network — battery, CO2, temperature warnings on hundreds of machines. Both were tracked using different tools. Neither was connected. And questions that involved cross-looking the two — such as, “is there a correlation between evening rush demand and device temperature alerts in some areas?” — were simply not being asked, there being no single place in which to ask.
We built an Operations Intelligence Dashboard on Databricks that unites the two data sources, displays live operational KPIs, and allows operations users to pose questions in plain English using Genie.
The Problem We Were Solving
The operations team needed insight into two domains simultaneously.
On the taxi operations side:
- What are the most active hours in terms of trips and revenue?
- What are the busiest pickup zones?
- What is the difference in weekday vs weekend demand?
- What are the highest revenue-per-mile routes?
- What are the trends in monthly trip volume?
On the IoT side:
- Which devices have reached critical battery levels?
- Which devices indicate high CO2 levels?
- Are any devices running hot?
- How many healthy vs warning vs critical devices?
These are both operations questions. The operations manager should not need to open two separate tools to answer them.
What We Used
- Dataset 1 — NYC Taxi Trips. Built into every Databricks workspace at
samples.nyctaxi.trips. 21,932 rows of actual NYC taxi trip records. - Dataset 2 — IoT Device Data. Available at
dbfs:/databricks-datasets/iot/iot_devices.json. 198,164 rows of simulated IoT sensor readings. - Storage — Unity Catalog. Both datasets in
dbw_pixel_demo.operations_domain. - Gold Layer —
dbw_pixel_demo.gold.operations. 7 SQL views — 4 taxi, 2 IoT, 1 combined summary. - Dashboard — Databricks AI/BI drag-and-drop dashboard.
- AI Layer — Genie. All 7 Gold views linked to natural language interface.
The Datasets in Detail
NYC Taxi Trips
21,932 rows. This table has just six columns — this is significant since our Gold views are composed with only these confirmed columns:
| Column | Type | What it contains |
|---|---|---|
| tpep_pickup_datetime | timestamp | When the trip started |
| tpep_dropoff_datetime | timestamp | When the trip ended |
| trip_distance | double | Distance travelled in miles |
| fare_amount | double | Base fare charged to the passenger |
| pickup_zip | int | ZIP code of the pickup location |
| dropoff_zip | int | ZIP code of the dropoff location |
Six columns sounds limited. However, time, distance, fare, and zone combined give you everything you need for operations analysis. Hour of day, day of week, and month from timestamps. Trip duration from the difference between pickup and dropoff. Revenue per mile from fare/distance. Zone-level demand from ZIP codes. Four different dimensions of analysis out of six columns.
IoT Device Data
198,164 rows. Simulated IoT sensor readings including:
| Column | What it contains |
|---|---|
| device_id | Unique device identifier |
| device_name | Human-readable device name |
| ip | Device IP address |
| battery_level | Current battery percentage (0-100) |
| c02_level | CO2 reading in parts per million |
| humidity | Humidity percentage |
| temp | Temperature in degrees Celsius |
How the Architecture Works
Layer 1 — Raw Tables
Two tables: nyc_taxi_trips and iot_devices. Not queried directly by the dashboard.
Layer 2 — Gold Views (7 SQL Views)
Taxi sub-domain — 4 views:
- trip_efficiency_summary — monthly and hourly trip stats. Total trips, average fare, total revenue, revenue per mile, average fare per trip.
- hourly_demand_pattern — demand by hour, day of week, and demand period (Morning rush, Afternoon, Evening rush, Late night, etc.). Includes week_type flag (Weekday vs Weekend).
- zone_performance — geographic efficiency. Groups by pickup/dropoff ZIP pair. Total trips, average fare, revenue per mile per route.
- trip_duration_analysis — time efficiency. Average, max, and min trip duration by month, hour, and pickup area.
IoT sub-domain — 2 views:
- iot_device_health — per-device health. Average/min/max battery, CO2, humidity, temperature. Low battery warnings (<20%), high CO2 (>1000 ppm), high temp (>35°C). Device health classified as Healthy (≥60% battery), Warning (30-59%), or Critical (<30%).
- iot_trend_summary — status distribution. Each reading categorised into battery tier, CO2 status, and temperature status.
Combined summary:
- operations_kpi_summary — UNION ALL of headline numbers from both sub-domains. Taxi row: total trips, total revenue, average fare, revenue per mile, unique zones. IoT row: total devices, healthy/warning/critical counts.
Layer 3 — Dashboard and Genie
Dashboard reads from Gold views. Genie picks the same Gold views. Numbers are identical on both surfaces.
The Dashboard Layout
- Row 1 — Taxi KPI tiles. Total Trips, Total Fare Revenue, Average Fare, Average Distance, Revenue per Mile, Pickup Zones.
- Row 2 — IoT KPI tiles. Total Devices, Healthy, Warning, Critical Devices. The Critical Devices tile draws immediate attention.
- Row 3 — Filters. Month, Week Type, Demand Period, Device Health Status.
- Row 4 — Hourly demand weekday vs weekend (bar) + Demand period revenue (horizontal bar).
- Row 5 — Monthly trip volume and revenue trend (line) + Average fare per hour of day (line).
- Row 6 — Top 15 pickup zones (horizontal bar) + IoT device health distribution (donut).
- Row 7 — Battery health by device (horizontal bar) + CO2 vs temperature status distribution (grouped bar).
The dashboard is live. Clicking the “Critical” filter on Device Health instantly reveals all critical devices across all IoT charts. Clicking “Evening rush” in the Demand Period filter filters all taxi charts to that period.
How Genie Works for Operations
We linked Genie to all seven Gold views, identified them as Trusted Assets, and provided routing instructions for two different sub-domain vocabularies:
Key routing instructions:
- Trip and fare questions → trip_efficiency_summary or hourly_demand_pattern.
- Zone and pickup questions → zone_performance.
- Duration and time questions → trip_duration_analysis.
- Device, battery, and health questions → iot_device_health.
- CO2, temperature, humidity questions → iot_trend_summary.
- Headline KPI questions → operations_kpi_summary.
“What time of day is the busiest?”
Top five highest-volume trip hours with average fare and total revenue.
“Which devices have low battery levels?”
All critical devices ranked by battery level, with CO2 and temperature warnings shown for the complete health picture.
“Which is the most revenue-generating zone?”
Top ten pickup-dropoff zone pairs ranked by revenue per mile, trip count, and average fare.
“How does trip volume change by day of the week?”
All seven days listed by trip volume, average fare, and total revenue.
Four Things We Learned
1. Six columns are enough when you use timestamps well
We initially thought the NYC taxi dataset was too small. It turned out to be plenty. The two timestamp columns alone give hour of day, day of week, month, and trip duration. Combined with fare and distance, you get demand analysis, efficiency analysis, zone analysis, and trend analysis. Do not disqualify a dataset with few columns — ask what you can derive from them.
2. Genie is more precise when alert thresholds are in the SQL views, not the dashboard
In the initial version, device health status was calculated in dashboard chart settings. Genie could not use it — it just read raw battery numbers. After moving the threshold logic into the view itself (CASE WHEN AVG(battery_level) >= 60 THEN 'Healthy'...), Genie could answer “which devices are critical?” accurately. Business logic belongs in the Gold layer, not the presentation layer.
3. Combining two unrelated datasets into one dashboard is more practical than it seems
We initially planned two separate dashboards. We are glad we did not split them. A combined view means an operations manager sees the entire picture at a single glance. Total Devices sits next to Total Trips. Healthy Devices follows Revenue per Mile. The single dashboard conveys that operations health is one thing, not two.
4. The Critical Devices KPI tile creates urgency that a chart never does
A donut chart showing 15% critical status looks informational. A counter tile reading “47 Critical Devices” painted red looks urgent. Same number, different widget, entirely different emotional reaction. For operations dashboards where numbers indicate physical equipment, counter tiles are better than charts for action-oriented KPIs.
How to Build This Yourself
- Load the data. NYC Taxi:
CREATE TABLE AS SELECT * FROM samples.nyctaxi.trips. IoT: Load fromdbfs:/databricks-datasets/iot/iot_devices.jsonusing Spark. - Match exact columns, then write views.
DESCRIBE TABLEfirst — column names in this table differ from other NYC Taxi versions online. - Create the Gold schema.
CREATE SCHEMA IF NOT EXISTS dbw_pixel_demo.gold.operations. - Run all seven Gold views. trip_efficiency_summary, hourly_demand_pattern, zone_performance, trip_duration_analysis, iot_device_health, iot_trend_summary, operations_kpi_summary.
- Build the dashboard. Add every Gold view as a Dataset in Databricks One → create KPI tiles before charts.
- Configure Genie. Add all seven views as Trusted Assets → add business instructions with distinct taxi/IoT routing → add sample questions → test before publishing.
- Publish. Auto-refresh hourly → share URL with the operations team.
Setup time: approximately two to three hours, mainly writing and testing seven Gold views.
What Changed for the Operations Team
Before this dashboard, the operations team checked taxi performance in one tool and device health in another. There was no conventional way of viewing the two jointly. If someone wanted to know whether evening rush demand correlated with higher device temperatures, that question was simply not asked — combining the two unrelated datasets was impossible.
After the dashboard, Genie questions are answered within seconds. Both sub-domains sit adjacent to each other. Filters affect both. The combined operations_kpi_summary view presents headline numbers from both in the same row of counter tiles.
Most importantly, device health ceased to be reactive. Previously, you would see a device fail and then investigate. Now, the Critical Devices counter tile shifts focus to struggling devices before they fail — and Genie can inform an operator which devices need attention and why.
Final Thought
The Operations dashboard is built using two free datasets — NYC Taxi from the Databricks samples catalog and IoT device data from DBFS — with less than 220,000 rows combined. It is smaller than the data in the Finance or Marketing dashboards.
But data size is not the point. The point is visibility. Before this dashboard, two mental models and two tools were needed to assess operations health across two sub-domains. After the dashboard, it needs one URL and one Genie space.
The optimal operations tool is not the one with the most data. It is the one that makes the most important numbers impossible to miss.