Using Array Functions for Rank Operations in ClickHouse
Implementing Rank Functions in ClickHouse with Array Functions
ClickHouse enables implementing rank functions commonly found in SQL databases using specialized array functions. The main functions discussed are:
arrayEnumeratearrayEnumerateDensearrayEnumerateUniq
These funcitons return indices corresponding to the rank of each element within an array:
SELECT
arrayEnumerate([10, 20, 30, 10, 40]) AS row_number,
arrayEnumerateDense([10, 20, 30, 10, 40]) AS dense_rank,
arrayEnumerateUniq([10, 20, 30, 10, 40]) AS uniq_rank;
Result:
┌─row_number──┬─dense_rank──┬─uniq_rank───┐
│ [1,2,3,4,5] │ [1,2,3,1,4] │ [1,1,1,2,1] │
└─────────────┴─────────────┴─────────────┘
Example: Window Functions Equivalence
This example demonstrates how to simulate SQL window functions ROW_NUMBER(), DENSE_RANK(), and a unique rank operation using ClickHouse.
- Create and Populate Test Dataset
CREATE TABLE test_data ENGINE=Memory AS
WITH (
SELECT ['A','A','A','A','B','B','B','B','B','A','59','90','80','80','65','75','78','88','99','70'] AS dict
SELECT dict[number%10+1] AS id, dict[number+11] AS val
FROM system.numbers LIMIT 10)
SELECT * FROM test_data;
Generated data:
┌─id─┬─val─┐
│ A │ 59 │
│ A │ 90 │
│ A │ 80 │
│ A │ 80 │
│ B │ 65 │
│ B │ 75 │
│ B │ 78 │
│ B │ 88 │
│ B │ 99 │
│ A │ 70 │
└────┴─────┘
- Using Array Functions for Ranking
First, group the val column by id into arrays:
SELECT
id,
groupArray(val) AS grouped_val,
arrayEnumerate(grouped_val) AS row_number,
arrayEnumerateDense(grouped_val) AS dense_rank,
arrayEnumerateUniq(grouped_val) AS uniq_rank
FROM
(SELECT * FROM test_data ORDER BY val ASC)
GROUP BY id;
This proudces grouped rankings for each id. To flatten the data:
SELECT
id,
val,
row_number,
dense_rank,
uniq_rank
FROM
(
SELECT
id,
groupArray(val) AS grouped_val,
arrayEnumerate(grouped_val) AS row_number,
arrayEnumerateDense(grouped_val) AS dense_rank,
arrayEnumerateUniq(grouped_val) AS uniq_rank
FROM
(SELECT * FROM test_data ORDER BY val ASC)
GROUP BY id
)
ARRAY JOIN grouped_val AS val, row_number, dense_rank, uniq_rank
ORDER BY id ASC, row_number ASC, dense_rank ASC;
Processed result:
┌─id─┬─val─┬─row_number─┬─dense_rank─┬─uniq_rank─┐
│ A │ 59 │ 1 │ 1 │ 1 │
│ A │ 70 │ 2 │ 2 │ 1 │
│ A │ 80 │ 3 │ 3 │ 1 │
│ A │ 80 │ 4 │ 3 │ 2 │
│ A │ 90 │ 5 │ 4 │ 1 │
│ B │ 65 │ 1 │ 1 │ 1 │
│ B │ 75 │ 2 │ 2 │ 1 │
│ B │ 78 │ 3 │ 3 │ 1 │
│ B │ 88 │ 4 │ 4 │ 1 │
│ B │ 99 │ 5 │ 5 │ 1 │
└────┴─────┴────────────┴────────────┴───────────┘