Home > Tech > Content

Clickhouse Shards and Replication Setup

Tech Mar 25 15

A replica in Clickhouse is essentially two or more tables or table segments containing identical data, typically intended for redundancy and data integrity. Sharding, on the other hand, divides table data horizontally across different servers, alleviating the workload on individual nodes. Clickhouse employs Zookeeper for implementing replication and sharding, requiring Zookeeper version 3.4.5 or newer.

Configuring Zookeeper for Clickhouse

Adding Zookeeper Configuration

The Zookeeper configuration can either be embedded directly in the main configuraton file (config.xml) or referenced from an external file.

Direct Configuration: Edit the Zookeeper node definitions in the main configuraton file:

<zookeeper>
    <node>
        <host>server1</host>
        <port>2181</port>
    </node>
    <node>
        <host>server2</host>
        <port>2181</port>
    </node>
</zookeeper>

External Configuration: Create a separate configuration file, e.g., /etc/clickhouse-server/config.d/zookeeper.xml, with the following format:

<?xml version="1.0"?>
<clickhouse>
    <zookeeper>
        <node>
            <host>server1</host>
            <port>2181</port>
        </node>
    </zookeeper>
</clickhouse>

Reference it from the main configuration file:

<include_from>/etc/clickhouse-server/config.d/zookeeper.xml</include_from>

Setting Up Replicas

Using ReplicatedMergeTree Engine

Configure replica nodes with the ReplicatedMergeTree engine: Create the replica table using distributed DDL:

CREATE TABLE cluster_node.replicated_table ON CLUSTER cluster_name (
    id Int32,
    name String
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/replicated_table', '{replica}') ORDER BY id;

Macro values like {shard} and {replica} denote each node’s shard and replica specifics.

Sharding Configuration

To distribute data across servers, define shards:

<remote_servers>
    <cluster>
        <shard>
            <replica>
                <host>server1</host>
                <port>9000</port>
            </replica>
        </shard>
        <shard>
            <replica>
                <host>server2</host>
                <port>9000</port>
            </replica>
        </shard>
    </cluster>
</remote_servers>

Adjust weights for custom data distribution proportions among shards if needed:

<shard>
    <weight>2</weight>
    <replica>
        <host>server1</host>
        <port>9000</port>
    </replica>
</shard>

Creating and Managing Distributed Tables

Distributed tables act as intermediaries to distribute data inserts to underlying replicated or sharded tables:

CREATE TABLE distributed_table AS base_table
ENGINE = Distributed(cluster_name, database_name, table_name, id);

Insert data and observe distribution across nodes:

INSERT INTO distributed_table VALUES (1, 'sample data');

View replicated or sharded table content on different nodes based on configured sharding keys and weights.

Combining Replication with Sharding

Integrate sharding with replication for optimal data management:

<remote_servers>
    <cluster_advanced>
        <shard>
            <internal_replication>true</internal_replication>
            <replica>
                <host>server1</host>
                <port>9000</port>
            </replica>
        </shard>
        <shard>
            <internal_replication>false</internal_replication>
            <replica>
                <host>server2</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>server3</host>
                <port>9000</port>
            </replica>
        </shard>
    </cluster_advanced>
</remote_servers>

Configure table creation and macros for shard-specific replication:

CREATE TABLE replicated_sharded_table ON CLUSTER cluster_advanced (
    id Int32,
    name String
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/replicated_sharded_table', '{replica}') ORDER BY id;

Distributed tables then direct writes across shards and replicas efficiently.

Combine distributed tables with replication for the most scalable and fault-tolerant architecture, maintaining optimal read/write loads and availability.

Tags: ClickHouse Replication

Back to List

Prev: Resolving Error 2059 (HY000) with MySQL Authentication Plugin 'caching_sha2_password'

Next: Using Default and Server Default Attributes in SQLAlchemy Column Definitions

Fading Coder

Clickhouse Shards and Replication Setup

Configuring Zookeeper for Clickhouse

Adding Zookeeper Configuration

Setting Up Replicas

Using ReplicatedMergeTree Engine

Sharding Configuration

Creating and Managing Distributed Tables

Combining Replication with Sharding

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Clickhouse Shards and Replication Setup

Configuring Zookeeper for Clickhouse

Adding Zookeeper Configuration

Setting Up Replicas

Using ReplicatedMergeTree Engine

Sharding Configuration

Creating and Managing Distributed Tables

Combining Replication with Sharding

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment