Their data is not about revenue or customers — it is about systems. How efficiently things are moving. Where demand is concentrating. Whether the equipment keeping everything running is in good health. Our operations team was responsible for two distinct domains simultaneously. On one side: taxi trip data covering volume, fare efficiency, zone-level demand, and hourly patterns. On the other: an IoT sensor network monitoring battery levels, CO2 readings, and temperature alerts across hundreds of devices.
Each was tracked using a separate tool. Neither was connected to the other. And questions that required looking across both — such as whether evening rush demand correlated with elevated device temperature alerts in specific areas — were simply not being asked, because there was no single place to ask them.
We built the Operations Intelligence Dashboard on Databricks to change that. It unifies both data domains, surfaces live operational KPIs, and enables operations users to ask questions across either sub-domain in plain English using Genie.
The Problem We Were Solving
The operations team had to be able to see into two domains at once.
On the taxi operations side:
- What hours of the day produce the highest volume and revenue?
- What areas are most active for pickups?
- How does demand compare between weekdays and weekends?
- What routes produce the highest revenue-per-mile?
- What are the monthly trends in terms of volume?
On the IoT side:
- Which devices have reached critical battery levels?
- Which are showing elevated CO2 readings?
- Are any running at high temperatures?
- What is the ratio of healthy to warning to critical devices across the fleet?
These are not separate questions about separate things. They are both operations questions. An operations manager should not need to open two different tools to answer them.
What We Used
- Dataset 1 — NYC Taxi Trips. Sourced from
samples.nyctaxi.trips, loaded intodbw_pixel_demo.operations_domain.nyc_taxi_trips. 21,932 actual data points with six confirmed columns: tpep_pickup_datetime, tpep_dropoff_datetime, trip_distance, fare_amount, pickup_zip, and dropoff_zip. The dataset does not include passenger count, tip amount, payment type, or total amount columns. - Dataset 2 — IoT Device Data. Obtained from
dbfs:/databricks-datasets/iot/iot_devices.jsonand stored indbw_pixel_demo.operations_domain.iot_devices. 198,164 rows of simulated IoT sensor readings. - Storage — Unity Catalog. Both datasets reside in
dbw_pixel_demo.operations_domain. - Gold Layer —
operations_gold. Seven SQL views — four for the taxi domain, two for the IoT domain, and one unified summary view. - Dashboard — Databricks AI/BI. Built using the native drag-and-drop interface.
- AI Layer — Genie. All seven Gold views connected as Trusted Assets.
The Datasets in Detail
NYC Taxi Trips — Column Structure
Six columns. That is the complete schema. Understanding what can be derived from them is what made this dataset analytically powerful:
| Column | Type | What It Contains |
|---|---|---|
| tpep_pickup_datetime | timestamp | When the trip started |
| tpep_dropoff_datetime | timestamp | When the trip ended |
| trip_distance | double | Distance travelled in miles |
| fare_amount | double | Base fare charged to the passenger |
| pickup_zip | int | ZIP code of the pickup location |
| dropoff_zip | int | ZIP code of the dropoff location |
From six columns, four analytical dimensions emerge. The two timestamp columns yield hour of day, day of week, month, and trip duration in minutes. Fare amount divided by trip distance produces revenue per mile. Pickup and dropoff ZIP codes enable zone-level demand analysis and route-level efficiency comparison. The analytical depth comes not from how many columns exist, but from how deliberately those columns are used.
IoT Device Data — Key Columns
| Column | What It Contains |
|---|---|
| device_id | Unique device identifier |
| device_name | Human-readable device name |
| battery_level | Current battery percentage (0–100) |
| c02_level | CO2 reading in parts per million |
| humidity | Humidity percentage |
| temp | Temperature in degrees Celsius |
How the Architecture Works
Layer 1 — Raw Tables (operations_domain)
Two tables: nyc_taxi_trips and iot_devices. Neither is queried directly from the dashboard. The raw taxi table contains no business-friendly column names or derived metrics. The raw IoT table contains 198,000 unaggregated sensor readings. Both require transformation before they can support operational decision-making.
Layer 2 — Gold Views (operations_gold)
Taxi sub-domain — 4 views:
- trip_efficiency_summary — monthly and hourly trip statistics. Groups trips by month and hour of pickup. Calculates total trips, average fare, total fare revenue, min and max fare, revenue per mile (SUM(fare_amount) / SUM(trip_distance)), and average fare per trip.
- hourly_demand_pattern — demand analysis. Grouped by hour of day, day of week, and demand period (Morning Rush 6-9am, Late Morning 10am-12pm, Afternoon 1-4pm, Evening Rush 5-8pm, Late Night 9-11pm, Early Morning). Includes week type flag (Weekday or Weekend).
- zone_performance — geographic efficiency analysis. Groups trips by pickup and dropoff ZIP code pair. Computes total trips, average fare, total revenue, revenue per mile, and average trips per route.
- trip_duration_analysis — time efficiency view. Groups trips by month, hour, and pickup area. Calculates mean, maximum, and minimum trip duration in minutes.
IoT sub-domain — 2 views:
- iot_device_health — device health view. Aggregates all readings by device. Calculates average, minimum, and maximum battery levels, CO2 levels, humidity, and temperature. Flags low battery (<20%), high CO2 (>1,000), and high temperature (>35°C). Classifies device health as Healthy (≥60% battery), Warning (30-59%), or Critical (<30%).
- iot_trend_summary — status distribution view. Classifies each reading into battery level status (High, Medium, Low), CO2 level status (Normal, Elevated, Critical), and temperature status (Normal, Warm, Hot).
Combined summary:
- operations_kpi_summary — UNION ALL view combining headline numbers from both sub-domains. The taxi row provides total trips, total revenue, average fare, revenue per mile, unique pickup zones, unique dropoff zones, and average trip duration. The IoT row provides total devices, healthy devices, warning devices, and critical devices.
Layer 3 — Dashboard and Genie
The dashboard reads from the Gold views. Genie reads from the same Gold views. Both surfaces show identical numbers, calculated identically.
What the Dashboard Looks Like
- Row 1 — Taxi KPI tiles. Total Trips, Total Fare Revenue, Average Fare, Average Distance (miles), Revenue per Mile, and Pickup Zones.
- Row 2 — IoT KPI tiles. Total Devices, Healthy Devices, Warning Devices, and Critical Devices. The Critical Devices tile is displayed in red and draws immediate attention.
- Row 3 — Filters. Month, Week Type (Weekday / Weekend), Demand Period, and Device Health Status. All charts and KPI tiles respond to filter changes simultaneously.
- Row 4 — Hourly trip demand by weekday versus weekend (bar chart) showing when demand peaks. Demand period revenue (horizontal bar) ranking Morning Rush, Evening Rush, and Late Night by total revenue.
- Row 5 — Monthly trip volume and revenue trend (line chart). Average fare by hour of day (line chart) showing when fares are highest and lowest.
- Row 6 — Top 15 pickup zones (horizontal bar chart). IoT device health distribution (donut chart).
- Row 7 — Battery health by device name (horizontal bar chart), ranking each device by battery level from lowest to highest. CO2 and temperature status distribution (grouped bar chart).
The dashboard is live. Selecting Critical in the Device Health filter instantly filters all IoT charts to critical devices only. Selecting Evening Rush in the Demand Period filter narrows all taxi charts to that time window. Both sub-domains respond to their respective filters simultaneously.
How We Configured Genie for Operations
All seven Gold views are connected to Genie and designated as Trusted Assets. Because the two sub-domains use entirely different terminology, the routing instructions were especially important.
The routing logic we specified:
- Trip and fare questions always route to trip_efficiency_summary or hourly_demand_pattern.
- Zone and pickup questions always route to zone_performance.
- Duration and time questions always route to trip_duration_analysis.
- Device battery and health questions always route to iot_device_health.
- CO2, temperature, and humidity questions always route to iot_trend_summary.
- Headline KPI questions always route to operations_kpi_summary.
We also defined key business terms precisely:
- “Busiest hour” means the hour with the highest total trips ranked from hourly_demand_pattern.
- “Most efficient zone” means revenue per mile sorted descending from zone_performance.
- “Critical devices” means device_health = Critical from iot_device_health.
- “Late night” means the Late Night demand period from hourly_demand_pattern.
“What time of day is busiest?”
Returns the top five highest-volume trip hours with average fare and total revenue.
“Which devices have low battery?”
Returns all critical devices ranked by battery level ascending, with CO2 and temperature warnings displayed alongside for a complete device health picture.
“Which zone generates the most revenue?”
Returns the top ten pickup-dropoff zone pairs by revenue per mile, trip count, and average fare.
“How does trip volume vary by day of week?”
Returns all seven days ranked by trip volume, average fare, and total revenue.
Four Things We Learned
1. Six Well-Used Columns Can Power a Full Operational Dashboard
When we first examined the NYC taxi dataset and found only six columns — no passenger count, no tip, no payment type — we questioned whether the dataset would be adequate. It was. The two timestamp columns alone yield four analytical dimensions: hour of day, day of week, month, and trip duration. Combined with fare and distance, those six columns support demand analysis, efficiency analysis, zone analysis, and trend analysis. The lesson is not to evaluate a dataset by column count, but by what can be derived from the columns it has.
2. Alert Thresholds Belong in the Gold Layer, Not the Presentation Layer
In the first version, device health status was calculated in the dashboard chart settings rather than in the Gold view itself. This meant Genie could only read raw battery numbers — it could not answer “which devices are critical?” accurately because the concept of “critical” did not exist in the data layer. After moving the threshold logic into the iot_device_health view using CASE WHEN statements, Genie answered health classification questions immediately and correctly. Business logic belongs in the Gold layer.
3. Combining Two Unrelated Domains in One Dashboard Is More Valuable Than It First Appears
We initially considered building two separate dashboards — one for taxi operations and one for IoT. We are glad we did not. Having both sub-domains on a single dashboard means an operations manager sees the complete picture in one view. Total Trips sits next to Total Devices. Revenue per Mile sits next to Healthy Devices. The proximity communicates that operations health is not two separate things — it is one.
4. A Counter Tile Creates Urgency That a Chart Cannot
A donut chart showing 15% of devices in critical status looks informational. A red counter tile displaying “47 Critical Devices” feels urgent. It is the same number, a different widget, and a completely different emotional response. On an operations dashboard where some numbers represent physical equipment that can fail, counter tiles are more effective than charts for KPIs that require immediate action.
How to Build This Yourself
- Load the data. NYC Taxi:
CREATE TABLE IF NOT EXISTS catalog_name.operations_domain.nyc_taxi_trips AS SELECT * FROM samples.nyctaxi.trips. IoT: read the JSON file fromdbfs:/databricks-datasets/iot/iot_devices.jsonusingspark.read.format('json')and save it usingsaveAsTable. - Check the column schema. Run
DESCRIBE TABLE nyc_taxi_tripsand verify the column names. The columns differ from other NYC Taxi data available online. - Create the Gold schema.
CREATE SCHEMA IF NOT EXISTS dbw_pixel_demo.operations_gold. - Build all seven Gold views. trip_efficiency_summary, hourly_demand_pattern, zone_performance, trip_duration_analysis, iot_device_health, iot_trend_summary, operations_kpi_summary.
- Build the dashboard. In Databricks AI/BI, create a new dashboard, add all seven Gold views as datasets, and build KPI counter tiles before charts.
- Configure Genie. Add all seven views as Trusted Assets, paste the routing instructions with distinct paths for taxi and IoT questions, add at least ten sample questions, and test before publishing.
- Publish and share. Set auto-refresh to hourly and share the URL with the operations team.
Total setup time is approximately two to three hours, the majority of which is writing and testing the seven Gold views.
What Changed for the Operations Team
Before this dashboard, the operations team used one tool to monitor taxi performance and a separate tool to monitor device health. There was no established way to view both together. Questions that required looking across both domains — such as whether evening rush patterns correlated with temperature alerts in specific areas — were not being asked, because there was no single place to ask them.
After the dashboard was live, those questions are answered in seconds within a single Genie conversation. Both sub-domains are displayed side by side. Filters affect both simultaneously. The operations_kpi_summary view presents headline numbers from both in the same row of counter tiles.
The most significant operational change was that device health monitoring shifted from reactive to proactive. Previously, the team would investigate a device after it failed. Now, the Critical Devices counter tile surfaces struggling devices before failure occurs — and Genie can tell an operator exactly which devices require attention and why.
Final Thought
The Operations Intelligence Dashboard was built by combining two well-structured datasets across different domains, seven purpose-built Gold views, a live dashboard designed around operational urgency, and a Genie configuration that routes across two entirely different vocabularies with precision.
What this project demonstrated most clearly is that the value of an operations dashboard is not determined by data volume. It is determined by visibility. Before this dashboard, two separate mental models and two separate tools were required to understand operational health. After it, one URL and one Genie space are sufficient.
The best operations tool is not the one with the most data. It is the one that makes the most important signals impossible to miss. We have done this for operations. We are ready to do it for you.