Analyzing Ride-Hailing Driver Metrics with SQL Window Functions
Data Schema
ride_requests table:
user_id: Passenger identifierlocation: City namerequest_ts: Timestamp of ride requestrequest_end_ts: Timestamp when request period endsbooking_id: Unique order identifier (null until accepted)
ride_orders table:
booking_id: Unique order identifieruser_id: Passenger identifieroperator_id: Driver identifieracceptance_ts: Timestamp when driver acceptspickup_ts: Timestamp when passenger boardscompletion_ts: Timestamp when ride finishes or is canceleddistance: Total distance traveledcharge: Total fare amountrating: Passenger rating (1-5)
Process Flow
- A request creates a
ride_requestsrecord with a nullbooking_id. - Upon driver acceptance, a
ride_ordersrecord is created. Left-side fields are populated, right-side fields are null.booking_idandacceptance_tsupdate theride_requestsrecord. - If unfulfilled (timeout or user cnaceled), only
request_end_tsis recorded. - Pre-pickup cancellations set
completion_tsto the cancel time, leaving other end fields null. - Passenger boarding populates
pickup_ts. - Ride completion populates
completion_ts,distance, andcharge. Theratingremains null until the passenger submits a review.
Problem Statement
Determine the average number of accepted bookings and the average total earnings (ignoring platform commissions, based on completed order fares) to drivers in Beijing who handled at least 3 booikngs during the 7-day National Day holiday in 2021 (October 1 to October 7). Round results to 3 decimal places. Output columns: city, avg_order_num, avg_income.
Query Design
To isolate the target drivers, a Common Table Expression (CTE) identifies operators with 3 or more accepted bookings in Beijing during the specified timeframe. The main query then joins this CTE with the original tables to compute the aggregates. Calculating the average metrics involves dividing the total bookings or total charges by the count of distinct qualifying operators. Grouping by location is required for the aggregate functions, even if the location is filtered to a single city.
Solution
WITH qualified_operators AS (
SELECT
o.operator_id
FROM
ride_requests r
JOIN ride_orders o ON r.booking_id = o.booking_id
WHERE
r.location = 'Beijing'
AND DATE(o.acceptance_ts) BETWEEN '2021-10-01' AND '2021-10-07'
GROUP BY
o.operator_id
HAVING
COUNT(o.booking_id) >= 3
)
SELECT
r.location AS city,
ROUND(COUNT(o.booking_id) * 1.0 / COUNT(DISTINCT o.operator_id), 3) AS avg_order_num,
ROUND(SUM(o.charge) * 1.0 / COUNT(DISTINCT o.operator_id), 3) AS avg_income
FROM
ride_requests r
JOIN ride_orders o ON r.booking_id = o.booking_id
JOIN qualified_operators q ON o.operator_id = q.operator_id
WHERE
r.location = 'Beijing'
AND DATE(o.acceptance_ts) BETWEEN '2021-10-01' AND '2021-10-07'
GROUP BY
r.location;