Choosing a ClickHouse Table Engine with Practical Patterns and Examples
Selecting a ClickHouse table engine shapes how data is laid out on disk, how it’s indexed and merged, what kinds of queries are efficient, and how ingestion behaves under concurrency. Engines fall into several families with orthogonal add‑ons for replication and sharding.
Engine families at a glance
- Log family: minimal storage formats aimed at tiny datasets and fast, append-only writes. Good for transient or staging data.
- MergeTree family: the default for analytical workloads; supports partitioning, ordering, sparse primary-key indexes, TTL, background merges, and secondary data-skipping indexes.
- Integration engines: access or ingest from external systems (e.g., Kafka, MySQL, HDFS, JDBC/ODBC).
- Special-purpose engines: Memory, File, Null, Buffer, etc., optimized for specific scenarios.
- Orthogonal layers: Replicated* engines add replication; Distributed provides cluster-wide parallel reads/writes and sharding. These compose with many base engines.
Quick selection rules
- Default to MergeTree unless you have a niche requirement.
- Use TinyLog/StripeLog/Log for small, ephemeral tables with simple scans.
- Use ReplacingMergeTree for eventual primary-key deduplication during merges.
- Use CollapsingMergeTree or VersionedCollapsingMergeTree when modeling upserts/deletes via sign/version semantics and you can rewrite aggregates.
- Use SummingMergeTree or AggregatingMergeTree for persisted rollups to accelerate heavy aggregations.
- Use Memory for fast temporary tables without durability.
- Use Buffer to batch writes into a target MergeTree table.
- Use Integration engines to read/ingest from external systems without staging.
Log family Common properties
- Append-only files on disk.
- No DELETE/UDPATE.
- No secondary indexes.
- Inserts are not atomic across columns and can block SELECTs.
Variants
- TinyLog: simplest format; each column in its own file; no concurrent reads. Best for tiny scratch data.
- StripeLog: stores columns in larger stripes to reduce small files; supports concurrent reads.
- Log: column-per-file like TinyLog but with concurrent reads.
Integration engines
- Kafka: consume topics directly into ClickHouse (often paired with materialized views).
- MySQL: query MySQL tables through ClickHouse as external data.
- JDBC/ODBC: query external DBs via connectors.
- HDFS: query files on HDFS by format.
Special-purpose engines
- Memory: data in RAM only; dropped on restart. Excellent for small, hot lookup tables or intermediate results.
- Buffer: in-memory buffer that asynchronous flushes to a destination table.
- File: treat a local file as a table in a given format.
- Null: writes are discarded; reads return empty sets.
MergeTree family in practice MergeTree
- Columnar storage ordered by a sort key.
- Partitioning reduces query and merge scope.
- Primary key provides a sparse index for data skipping; it does not anforce uniqueness.
- Background merges compact parts and apply engine-specific logic.
Example: primary key is for lookups, not uniqueness
-- Base table
CREATE TABLE events_mt (
user_id UInt32,
event_date Date,
note Nullable(String)
) ENGINE = MergeTree
PARTITION BY toYYYYMM(event_date)
ORDER BY (user_id, event_date)
PRIMARY KEY (user_id, event_date)
TTL event_date + INTERVAL 30 DAY
SETTINGS index_granularity = 8192;
-- Insert duplicate primary-key rows
INSERT INTO events_mt VALUES (1001, '2020-01-01', NULL);
INSERT INTO events_mt VALUES (1001, '2020-01-01', NULL);
INSERT INTO events_mt VALUES (1002, '2020-01-02', NULL);
INSERT INTO events_mt VALUES (1002, '2020-01-02', NULL);
INSERT INTO events_mt VALUES (1003, '2020-01-03', NULL);
-- Duplicates are visible
SELECT count(*) FROM events_mt;
-- ┌─count()─┐
-- │ 5 │
-- └─────────┘
SELECT * FROM events_mt ORDER BY user_id, event_date;
-- 1001 2020-01-01 NULL
-- 1001 2020-01-01 NULL
-- 1002 2020-01-02 NULL
-- 1002 2020-01-02 NULL
-- 1003 2020-01-03 NULL
-- Even after a final merge, duplicates remain
OPTIMIZE TABLE events_mt FINAL;
SELECT count(*) FROM events_mt;
-- 5
ReplacingMergeTree
- Deduplicates rows that share the same primary key during background merges.
- Without a version column, an arbitrary surviving row is kept among duplicates.
- Limitations:
- Deduplication happens eventually (merge time); queries may see duplicates.
- In distributed deployments, duplicates on different shards are not removed until co-located.
- FINAL merges are expensive and not real-time.
Example: eventual deduplication
CREATE TABLE events_rep (
user_id UInt32,
event_date Date,
note Nullable(String)
) ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (user_id, event_date)
PRIMARY KEY (user_id, event_date)
TTL event_date + INTERVAL 30 DAY;
INSERT INTO events_rep VALUES
(1001, '2020-01-01', NULL),
(1001, '2020-01-01', NULL),
(1002, '2020-01-02', NULL),
(1002, '2020-01-02', NULL),
(1003, '2020-01-03', NULL);
SELECT count(*) FROM events_rep; -- 5
OPTIMIZE TABLE events_rep FINAL;
SELECT count(*) FROM events_rep; -- 3
SELECT * FROM events_rep ORDER BY user_id, event_date;
-- 1001 2020-01-01 NULL
-- 1002 2020-01-02 NULL
-- 1003 2020-01-03 NULL
CollapsingMergeTree
- Adds a Sign column. During merges, rows with the same primary key and opposite Sign cancel each other.
- Model updates/deletes by inserting a cancel row (Sign = -1) that mirrors a prior state and, for updates, a new status row (Sign = 1).
- Because merges are asynchornous and distributed, aggregates must be rewritten to use Sign for correctness:
- count() → sum(Sign)
- sum(x) → sum(x * Sign)
- Useful for event-sourced streams where you can tolerate rewrite of aggregations and eventual folding.
Example: update modeled via sign rows
CREATE TABLE account_usage (
account_id UInt64,
views UInt32,
seconds UInt32,
Sign Int8
) ENGINE = CollapsingMergeTree(Sign)
ORDER BY account_id;
-- Initial snapshot
INSERT INTO account_usage VALUES (42, 5, 120, 1);
-- Cancel the old snapshot and write a new one
INSERT INTO account_usage VALUES (42, 5, 120, -1), (42, 6, 180, 1);
-- Before merge: both rows exist
SELECT * FROM account_usage ORDER BY account_id, Sign;
-- 42 5 120 -1
-- 42 6 180 1
-- 42 5 120 1
-- Correct aggregation with sign-aware expressions
SELECT
account_id,
sum(views * Sign) AS views,
sum(seconds * Sign) AS seconds
FROM account_usage
GROUP BY account_id
HAVING sum(Sign) > 0;
-- 42 6 180
OPTIMIZE TABLE account_usage FINAL;
SELECT * FROM account_usage;
-- 42 6 180 1
Out-of-order inserts can break folding in CollapsingMergeTree
CREATE TABLE account_usage_oo (
account_id UInt64,
views UInt32,
seconds UInt32,
Sign Int8
) ENGINE = CollapsingMergeTree(Sign)
ORDER BY account_id;
-- Cancel arrives before the original status
INSERT INTO account_usage_oo VALUES (99, 1, 10, -1);
INSERT INTO account_usage_oo VALUES (99, 1, 10, 1);
OPTIMIZE TABLE account_usage_oo FINAL;
SELECT * FROM account_usage_oo ORDER BY account_id, Sign;
-- 99 1 10 -1
-- 99 1 10 1
VersionedCollapsingMergeTree
- Adds a Version column so that matching status/cancel pairs are folded even if they arrive out of order.
- Rows with equal primary key and Version but opposite Sign are eliminated during merges.
- Aggregates require the same sign-aware rewrites as CollapsingMergeTree.
Example: out-of-order updates with versioning
CREATE TABLE account_usage_v (
account_id UInt64,
views UInt32,
seconds UInt32,
Sign Int8,
Version UInt32
) ENGINE = VersionedCollapsingMergeTree(Sign, Version)
ORDER BY account_id;
-- Cancel (v=1) arrives first
INSERT INTO account_usage_v VALUES (7, 5, 100, -1, 1);
-- Status (v=1) and then new status (v=2)
INSERT INTO account_usage_v VALUES (7, 5, 100, 1, 1), (7, 6, 150, 1, 2);
-- All rows visible pre-merge
SELECT account_id, views, seconds, Sign, Version FROM account_usage_v ORDER BY Version;
-- (7,5,100,-1,1)
-- (7,5,100, 1,1)
-- (7,6,150, 1,2)
-- Sign-aware aggregation
SELECT
account_id,
sum(views * Sign) AS views,
sum(seconds * Sign) AS seconds
FROM account_usage_v
GROUP BY account_id
HAVING sum(Sign) > 0;
-- 7 6 150
OPTIMIZE TABLE account_usage_v FINAL;
SELECT * FROM account_usage_v;
-- 7 6 150 1 2
SummingMergeTree
- Pre-aggregates numeric, non-key columns across identical primary keys during background merges.
- Non-aggregatable columns (e.g., String) are taken arbitrarily from one row.
- Because merges are asynchronous, queries should still use GROUP BY for correctness.
- Often paired with a detailed MergeTree table for raw data and a SummingMergeTree table for rollups.
Example: persisted partial sums
CREATE TABLE sums (
k UInt32,
value UInt32
) ENGINE = SummingMergeTree
ORDER BY k;
INSERT INTO sums VALUES (1,1), (1,2), (2,1);
-- Before merge
SELECT * FROM sums ORDER BY k, value;
-- 1 1
-- 1 2
-- 2 1
-- Query still uses GROUP BY
SELECT k, sum(value) FROM sums GROUP BY k ORDER BY k;
-- 1 3
-- 2 1
OPTIMIZE TABLE sums FINAL;
SELECT * FROM sums ORDER BY k;
-- 1 3
-- 2 1
AggregatingMergeTree
- Stores aggregation states (not plain values) for arbitrary aggregate functions.
- Use State in INSERT/MV pipelines to produce states, and Merge in SELECT to finalize.
- Best with materialized views (for automatic rollups) or with AggregateFunction-typed columns.
Example 1: materialized view producing states
-- Raw facts
CREATE TABLE page_views_raw (
user_id UInt64,
site_id UInt16,
day Date
) ENGINE = MergeTree
ORDER BY (site_id, day);
-- Rollup view storing states
CREATE MATERIALIZED VIEW page_views_rollup
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(day)
ORDER BY (site_id, day)
AS
SELECT
site_id,
day,
countState() AS views_state,
uniqState(user_id) AS users_state
FROM page_views_raw
GROUP BY site_id, day;
-- Ingest some facts
INSERT INTO page_views_raw VALUES
(10, 1, '2020-01-01'),
(11, 1, '2020-01-01'),
(10, 1, '2020-01-02');
-- Finalize states at query time
SELECT
day,
sumMerge(views_state) AS views,
uniqMerge(users_state) AS users
FROM page_views_rollup
GROUP BY day
ORDER BY day;
-- 2020-01-01 2 2
-- 2020-01-02 1 1
-- Using sum/uniq directly will fail because columns store aggregation states
Example 2: AggregateFunction-typed column
-- Detailed data
CREATE TABLE raw_sessions (
site_id UInt16,
day Date,
user_id UInt64
) ENGINE = MergeTree
PARTITION BY toYYYYMM(day)
ORDER BY (site_id, day);
INSERT INTO raw_sessions VALUES
(1, '2020-01-01', 10),
(1, '2020-01-01', 11),
(1, '2020-01-02', 10);
-- Table storing a uniq aggregation state per (site_id, day)
CREATE TABLE daily_users (
site_id UInt16,
day Date,
users AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(day)
ORDER BY (site_id, day);
-- Populate from details using <agg>State
INSERT INTO daily_users
SELECT site_id, day, uniqState(user_id)
FROM raw_sessions
GROUP BY site_id, day;
-- Direct scalar inserts are invalid for AggregatingMergeTree
-- INSERT INTO daily_users VALUES (1, '2020-01-01', 10); -- error
-- Query with <agg>Merge to finalize
SELECT site_id, day, uniqMerge(users) AS unique_users
FROM daily_users
GROUP BY site_id, day
ORDER BY day;
-- (1, 2020-01-01, 2)
-- (1, 2020-01-02, 1)