Home > Tech > Content

Choosing a ClickHouse Table Engine with Practical Patterns and Examples

Tech Apr 8 19

Selecting a ClickHouse table engine shapes how data is laid out on disk, how it’s indexed and merged, what kinds of queries are efficient, and how ingestion behaves under concurrency. Engines fall into several families with orthogonal add‑ons for replication and sharding.

Engine families at a glance

Log family: minimal storage formats aimed at tiny datasets and fast, append-only writes. Good for transient or staging data.
MergeTree family: the default for analytical workloads; supports partitioning, ordering, sparse primary-key indexes, TTL, background merges, and secondary data-skipping indexes.
Integration engines: access or ingest from external systems (e.g., Kafka, MySQL, HDFS, JDBC/ODBC).
Special-purpose engines: Memory, File, Null, Buffer, etc., optimized for specific scenarios.
Orthogonal layers: Replicated* engines add replication; Distributed provides cluster-wide parallel reads/writes and sharding. These compose with many base engines.

Quick selection rules

Default to MergeTree unless you have a niche requirement.
Use TinyLog/StripeLog/Log for small, ephemeral tables with simple scans.
Use ReplacingMergeTree for eventual primary-key deduplication during merges.
Use CollapsingMergeTree or VersionedCollapsingMergeTree when modeling upserts/deletes via sign/version semantics and you can rewrite aggregates.
Use SummingMergeTree or AggregatingMergeTree for persisted rollups to accelerate heavy aggregations.
Use Memory for fast temporary tables without durability.
Use Buffer to batch writes into a target MergeTree table.
Use Integration engines to read/ingest from external systems without staging.

Log family Common properties

Append-only files on disk.
No DELETE/UDPATE.
No secondary indexes.
Inserts are not atomic across columns and can block SELECTs.

Variants

TinyLog: simplest format; each column in its own file; no concurrent reads. Best for tiny scratch data.
StripeLog: stores columns in larger stripes to reduce small files; supports concurrent reads.
Log: column-per-file like TinyLog but with concurrent reads.

Integration engines

Kafka: consume topics directly into ClickHouse (often paired with materialized views).
MySQL: query MySQL tables through ClickHouse as external data.
JDBC/ODBC: query external DBs via connectors.
HDFS: query files on HDFS by format.

Special-purpose engines

Memory: data in RAM only; dropped on restart. Excellent for small, hot lookup tables or intermediate results.
Buffer: in-memory buffer that asynchronous flushes to a destination table.
File: treat a local file as a table in a given format.
Null: writes are discarded; reads return empty sets.

MergeTree family in practice MergeTree

Columnar storage ordered by a sort key.
Partitioning reduces query and merge scope.
Primary key provides a sparse index for data skipping; it does not anforce uniqueness.
Background merges compact parts and apply engine-specific logic.

Example: primary key is for lookups, not uniqueness

-- Base table
CREATE TABLE events_mt (
  user_id     UInt32,
  event_date  Date,
  note        Nullable(String)
) ENGINE = MergeTree
PARTITION BY toYYYYMM(event_date)
ORDER BY (user_id, event_date)
PRIMARY KEY (user_id, event_date)
TTL event_date + INTERVAL 30 DAY
SETTINGS index_granularity = 8192;

-- Insert duplicate primary-key rows
INSERT INTO events_mt VALUES (1001, '2020-01-01', NULL);
INSERT INTO events_mt VALUES (1001, '2020-01-01', NULL);
INSERT INTO events_mt VALUES (1002, '2020-01-02', NULL);
INSERT INTO events_mt VALUES (1002, '2020-01-02', NULL);
INSERT INTO events_mt VALUES (1003, '2020-01-03', NULL);

-- Duplicates are visible
SELECT count(*) FROM events_mt;
-- ┌─count()─┐
-- │       5 │
-- └─────────┘

SELECT * FROM events_mt ORDER BY user_id, event_date;
-- 1001 2020-01-01  NULL
-- 1001 2020-01-01  NULL
-- 1002 2020-01-02  NULL
-- 1002 2020-01-02  NULL
-- 1003 2020-01-03  NULL

-- Even after a final merge, duplicates remain
OPTIMIZE TABLE events_mt FINAL;
SELECT count(*) FROM events_mt;
-- 5

ReplacingMergeTree

Deduplicates rows that share the same primary key during background merges.
Without a version column, an arbitrary surviving row is kept among duplicates.
Limitations:
- Deduplication happens eventually (merge time); queries may see duplicates.
- In distributed deployments, duplicates on different shards are not removed until co-located.
- FINAL merges are expensive and not real-time.

Example: eventual deduplication

CREATE TABLE events_rep (
  user_id     UInt32,
  event_date  Date,
  note        Nullable(String)
) ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (user_id, event_date)
PRIMARY KEY (user_id, event_date)
TTL event_date + INTERVAL 30 DAY;

INSERT INTO events_rep VALUES
  (1001, '2020-01-01', NULL),
  (1001, '2020-01-01', NULL),
  (1002, '2020-01-02', NULL),
  (1002, '2020-01-02', NULL),
  (1003, '2020-01-03', NULL);

SELECT count(*) FROM events_rep;  -- 5

OPTIMIZE TABLE events_rep FINAL;
SELECT count(*) FROM events_rep;  -- 3
SELECT * FROM events_rep ORDER BY user_id, event_date;
-- 1001 2020-01-01 NULL
-- 1002 2020-01-02 NULL
-- 1003 2020-01-03 NULL

CollapsingMergeTree

Adds a Sign column. During merges, rows with the same primary key and opposite Sign cancel each other.
Model updates/deletes by inserting a cancel row (Sign = -1) that mirrors a prior state and, for updates, a new status row (Sign = 1).
Because merges are asynchornous and distributed, aggregates must be rewritten to use Sign for correctness:
- count() → sum(Sign)
- sum(x) → sum(x * Sign)
Useful for event-sourced streams where you can tolerate rewrite of aggregations and eventual folding.

Example: update modeled via sign rows

CREATE TABLE account_usage (
  account_id UInt64,
  views      UInt32,
  seconds    UInt32,
  Sign       Int8
) ENGINE = CollapsingMergeTree(Sign)
ORDER BY account_id;

-- Initial snapshot
INSERT INTO account_usage VALUES (42, 5, 120, 1);

-- Cancel the old snapshot and write a new one
INSERT INTO account_usage VALUES (42, 5, 120, -1), (42, 6, 180, 1);

-- Before merge: both rows exist
SELECT * FROM account_usage ORDER BY account_id, Sign;
-- 42 5 120 -1
-- 42 6 180  1
-- 42 5 120  1

-- Correct aggregation with sign-aware expressions
SELECT
  account_id,
  sum(views * Sign)   AS views,
  sum(seconds * Sign) AS seconds
FROM account_usage
GROUP BY account_id
HAVING sum(Sign) > 0;
-- 42 6 180

OPTIMIZE TABLE account_usage FINAL;
SELECT * FROM account_usage;
-- 42 6 180 1

Out-of-order inserts can break folding in CollapsingMergeTree

CREATE TABLE account_usage_oo (
  account_id UInt64,
  views      UInt32,
  seconds    UInt32,
  Sign       Int8
) ENGINE = CollapsingMergeTree(Sign)
ORDER BY account_id;

-- Cancel arrives before the original status
INSERT INTO account_usage_oo VALUES (99, 1, 10, -1);
INSERT INTO account_usage_oo VALUES (99, 1, 10,  1);

OPTIMIZE TABLE account_usage_oo FINAL;
SELECT * FROM account_usage_oo ORDER BY account_id, Sign;
-- 99 1 10 -1
-- 99 1 10  1

VersionedCollapsingMergeTree

Adds a Version column so that matching status/cancel pairs are folded even if they arrive out of order.
Rows with equal primary key and Version but opposite Sign are eliminated during merges.
Aggregates require the same sign-aware rewrites as CollapsingMergeTree.

Example: out-of-order updates with versioning

CREATE TABLE account_usage_v (
  account_id UInt64,
  views      UInt32,
  seconds    UInt32,
  Sign       Int8,
  Version    UInt32
) ENGINE = VersionedCollapsingMergeTree(Sign, Version)
ORDER BY account_id;

-- Cancel (v=1) arrives first
INSERT INTO account_usage_v VALUES (7, 5, 100, -1, 1);
-- Status (v=1) and then new status (v=2)
INSERT INTO account_usage_v VALUES (7, 5, 100,  1, 1), (7, 6, 150, 1, 2);

-- All rows visible pre-merge
SELECT account_id, views, seconds, Sign, Version FROM account_usage_v ORDER BY Version;
-- (7,5,100,-1,1)
-- (7,5,100, 1,1)
-- (7,6,150, 1,2)

-- Sign-aware aggregation
SELECT
  account_id,
  sum(views * Sign)   AS views,
  sum(seconds * Sign) AS seconds
FROM account_usage_v
GROUP BY account_id
HAVING sum(Sign) > 0;
-- 7 6 150

OPTIMIZE TABLE account_usage_v FINAL;
SELECT * FROM account_usage_v;
-- 7 6 150 1 2

SummingMergeTree

Pre-aggregates numeric, non-key columns across identical primary keys during background merges.
Non-aggregatable columns (e.g., String) are taken arbitrarily from one row.
Because merges are asynchronous, queries should still use GROUP BY for correctness.
Often paired with a detailed MergeTree table for raw data and a SummingMergeTree table for rollups.

Example: persisted partial sums

CREATE TABLE sums (
  k     UInt32,
  value UInt32
) ENGINE = SummingMergeTree
ORDER BY k;

INSERT INTO sums VALUES (1,1), (1,2), (2,1);

-- Before merge
SELECT * FROM sums ORDER BY k, value;
-- 1 1
-- 1 2
-- 2 1

-- Query still uses GROUP BY
SELECT k, sum(value) FROM sums GROUP BY k ORDER BY k;
-- 1 3
-- 2 1

OPTIMIZE TABLE sums FINAL;
SELECT * FROM sums ORDER BY k;
-- 1 3
-- 2 1

AggregatingMergeTree

Stores aggregation states (not plain values) for arbitrary aggregate functions.
Use State in INSERT/MV pipelines to produce states, and Merge in SELECT to finalize.
Best with materialized views (for automatic rollups) or with AggregateFunction-typed columns.

Example 1: materialized view producing states

-- Raw facts
CREATE TABLE page_views_raw (
  user_id  UInt64,
  site_id  UInt16,
  day      Date
) ENGINE = MergeTree
ORDER BY (site_id, day);

-- Rollup view storing states
CREATE MATERIALIZED VIEW page_views_rollup
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(day)
ORDER BY (site_id, day)
AS
SELECT
  site_id,
  day,
  countState()         AS views_state,
  uniqState(user_id)   AS users_state
FROM page_views_raw
GROUP BY site_id, day;

-- Ingest some facts
INSERT INTO page_views_raw VALUES
  (10, 1, '2020-01-01'),
  (11, 1, '2020-01-01'),
  (10, 1, '2020-01-02');

-- Finalize states at query time
SELECT
  day,
  sumMerge(views_state) AS views,
  uniqMerge(users_state) AS users
FROM page_views_rollup
GROUP BY day
ORDER BY day;
-- 2020-01-01 2 2
-- 2020-01-02 1 1

-- Using sum/uniq directly will fail because columns store aggregation states

Example 2: AggregateFunction-typed column

-- Detailed data
CREATE TABLE raw_sessions (
  site_id UInt16,
  day     Date,
  user_id UInt64
) ENGINE = MergeTree
PARTITION BY toYYYYMM(day)
ORDER BY (site_id, day);

INSERT INTO raw_sessions VALUES
  (1, '2020-01-01', 10),
  (1, '2020-01-01', 11),
  (1, '2020-01-02', 10);

-- Table storing a uniq aggregation state per (site_id, day)
CREATE TABLE daily_users (
  site_id UInt16,
  day     Date,
  users   AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(day)
ORDER BY (site_id, day);

-- Populate from details using <agg>State
INSERT INTO daily_users
SELECT site_id, day, uniqState(user_id)
FROM raw_sessions
GROUP BY site_id, day;

-- Direct scalar inserts are invalid for AggregatingMergeTree
-- INSERT INTO daily_users VALUES (1, '2020-01-01', 10);  -- error

-- Query with <agg>Merge to finalize
SELECT site_id, day, uniqMerge(users) AS unique_users
FROM daily_users
GROUP BY site_id, day
ORDER BY day;
-- (1, 2020-01-01, 2)
-- (1, 2020-01-02, 1)

Tags: ClickHouse Table Engines MergeTree

Back to List

Prev: Implementing an Immersive Status Bar with the Android SlidingMenu Library

Next: Building the Smallest Win32 GUI Application in C++ with Visual Studio

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Overview In a recent project, I utilized the SBUS protocol with the Fus remote controller to control a vehicle's basic operations, including movement, lights, and mode switching. This article is aimed...

Fading Coder

Choosing a ClickHouse Table Engine with Practical Patterns and Examples

Related Articles

Understanding Strong and Weak References in Java

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Choosing a ClickHouse Table Engine with Practical Patterns and Examples

Related Articles

Understanding Strong and Weak References in Java

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment