Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Calculating Video Completion Rate Using SQL: A Practical Approach

Tech 2

The objective is to compute the completion rate for each video that had play activity in 2021, rounded to three decimal places, and order the results in descending order. The completion rate is defined as the proportion of plays where the viewing duration was greater than or equal to the video's length. The column for the rate should be named avg_comp_play_rate.

The data resides in two tables:

Interaction Log (user_video_log)

  • user_id: User identifier
  • video_id: Video identifier
  • start_ts: Timestamp when playback began (second precision)
  • end_ts: Timestamp when playback ended (second precision)
  • is_follow: Boolean for follow acsion
  • is_like: Boolean for like action
  • is_share: Boolean for share action
  • comment_id: Identifier for a comment

Video Metadata (video_info)

  • video_id: Video identifier
  • creator_id: Creator identifier
  • category_tag: Content category
  • video_duration: Length of the video in seconds
  • publish_time: Publication timestamp

Query Construction Strategy

A structured approach begins by outlining the core SQL skeleton based on the problem's clauses.

  1. Identify the Target and Filters

    • For videos in 2021: A filter on start_ts using WHERE YEAR(start_ts) = 2021.
    • For each video: A grouping clause GROUP BY video_id.
    • Output specification: The SELECT list must include video_id and the calculated completion rate.
    • Ordering: Results must be sorted ORDER BY the completion rate in DESC order.
  2. Form the Initial Skeleton

    SELECT video_id, [completion_rate] AS avg_comp_play_rate
    FROM [table_source]
    WHERE YEAR(start_ts) = 2021
    GROUP BY video_id
    ORDER BY avg_comp_play_rate DESC
    
  3. Resolve the Placeholders

    • [table_source]: Data is required from both tables. An INNER JOIN on the common video_id is necessary.
    • [completion_rate]: This is the core calculation: Number of Complete Plays / Total Plays. A complete play occurs when (end_ts - start_ts) >= video_duration. The time difference can be calculated using TIMESTAMPDIFF(SECOND, start_ts, end_ts).
      • Method A (Explicit Counts):
        ROUND(
            SUM(CASE WHEN TIMESTAMPDIFF(SECOND, start_ts, end_ts) >= video_duration THEN 1 ELSE 0 END) / COUNT(*),
            3
        )
        
        Alternative, COUNT(CASE WHEN ... THEN 1 END) can be used for the numerator, as COUNT ignores NULL.
      • Method B (Using AVG): Since the condition yields 1 for complete and 0 for incomplete, the average directly gives the ratio.
        ROUND(
            AVG(CASE WHEN TIMESTAMPDIFF(SECOND, start_ts, end_ts) >= video_duration THEN 1.0 ELSE 0 END),
            3
        )
        
  4. Assemble the Final Query Combining the components yields the executable SQL statement.

SELECT 
    vlog.video_id,
    ROUND(
        SUM(CASE WHEN TIMESTAMPDIFF(SECOND, start_ts, end_ts) >= info.video_duration THEN 1 ELSE 0 END) / COUNT(*),
        3
    ) AS avg_comp_play_rate
FROM user_video_log AS vlog
INNER JOIN video_info AS info ON vlog.video_id = info.video_id
WHERE YEAR(start_ts) = 2021
GROUP BY vlog.video_id
ORDER BY avg_comp_play_rate DESC;

Key Technical Concepts

1. Conditional Aggregation Using CASE expressions inside aggregate functions like SUM or AVG is a standard technique for counting rows that meet specific criteria.

2. Handling NULLs in Aggregates

  • SUM(column) and AVG(column) ignore NULL values within the specified column.
  • COUNT(column) counts non-NULL values in that column.
  • COUNT(*) counts all rows in the group, irrespective of NULLs.

3. Date and Time Functions

  • YEAR(date): Extracts the year component from a date or datetime.
  • TIMESTAMPDIFF(unit, datetime1, datetime2): Calculates the difefrence (datetime2 - datetime1) in the specified unit (e.g., SECOND).

4. String and Date Extraction While YEAR() is ideal for extracting the year, alternative string functions exist:

  • LEFT(string, n): Returns the first n characters.
  • SUBSTRING(string, start, length): Extracts a substring from position start. Example: WHERE LEFT(start_ts, 4) = '2021' achieves the same filter.

5. Rounding Numeric Output The ROUND(value, decimal_places) function is used to format the final result to the required precision.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.