Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Analyzing Ride-Hailing Driver Metrics with SQL Window Functions

Tech 1

Data Schema

ride_requests table:

  • user_id: Passenger identifier
  • location: City name
  • request_ts: Timestamp of ride request
  • request_end_ts: Timestamp when request period ends
  • booking_id: Unique order identifier (null until accepted)

ride_orders table:

  • booking_id: Unique order identifier
  • user_id: Passenger identifier
  • operator_id: Driver identifier
  • acceptance_ts: Timestamp when driver accepts
  • pickup_ts: Timestamp when passenger boards
  • completion_ts: Timestamp when ride finishes or is canceled
  • distance: Total distance traveled
  • charge: Total fare amount
  • rating: Passenger rating (1-5)

Process Flow

  1. A request creates a ride_requests record with a null booking_id.
  2. Upon driver acceptance, a ride_orders record is created. Left-side fields are populated, right-side fields are null. booking_id and acceptance_ts update the ride_requests record.
  3. If unfulfilled (timeout or user cnaceled), only request_end_ts is recorded.
  4. Pre-pickup cancellations set completion_ts to the cancel time, leaving other end fields null.
  5. Passenger boarding populates pickup_ts.
  6. Ride completion populates completion_ts, distance, and charge. The rating remains null until the passenger submits a review.

Problem Statement

Determine the average number of accepted bookings and the average total earnings (ignoring platform commissions, based on completed order fares) to drivers in Beijing who handled at least 3 booikngs during the 7-day National Day holiday in 2021 (October 1 to October 7). Round results to 3 decimal places. Output columns: city, avg_order_num, avg_income.

Query Design

To isolate the target drivers, a Common Table Expression (CTE) identifies operators with 3 or more accepted bookings in Beijing during the specified timeframe. The main query then joins this CTE with the original tables to compute the aggregates. Calculating the average metrics involves dividing the total bookings or total charges by the count of distinct qualifying operators. Grouping by location is required for the aggregate functions, even if the location is filtered to a single city.

Solution

WITH qualified_operators AS (
    SELECT
        o.operator_id
    FROM
        ride_requests r
    JOIN ride_orders o ON r.booking_id = o.booking_id
    WHERE
        r.location = 'Beijing'
        AND DATE(o.acceptance_ts) BETWEEN '2021-10-01' AND '2021-10-07'
    GROUP BY
        o.operator_id
    HAVING
        COUNT(o.booking_id) >= 3
)
SELECT
    r.location AS city,
    ROUND(COUNT(o.booking_id) * 1.0 / COUNT(DISTINCT o.operator_id), 3) AS avg_order_num,
    ROUND(SUM(o.charge) * 1.0 / COUNT(DISTINCT o.operator_id), 3) AS avg_income
FROM
    ride_requests r
JOIN ride_orders o ON r.booking_id = o.booking_id
JOIN qualified_operators q ON o.operator_id = q.operator_id
WHERE
    r.location = 'Beijing'
    AND DATE(o.acceptance_ts) BETWEEN '2021-10-01' AND '2021-10-07'
GROUP BY
    r.location;

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Overview In a recent project, I utilized the SBUS protocol with the Fus remote controller to control a vehicle's basic operations, including movement, lights, and mode switching. This article is aimed...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.