Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Building Multi-Primary Replication with PostgreSQL Logical Replication

Tech 1

PostgreSQL logical replication offfers a straightforward path to uni-directional streaming. With careful design, it can also underpin multi-primary (multi-master) deployments by addressing two core challenges:

  • Concurrency conflicts when multiple nodes modify the same row
  • Replication loops that endlessly re-apply the same change across peers

Conflict management

Practical conflict handling typically relies on deterministic rules the system can enforce automatically.

  • INSERT

    • Partition primary keys per node to avoid collisions. Use per-node sequences with different starting points and a fixed increment.
    • If duplicates can occur, decide a winner using a stable rule (e.g., latest timestamp or a preferred node). Add metadata columns to support this decision (e.g., node identifier and last_modified).
  • UPDATE

    • Choose a rule such as "highest last_modified wins" or "preferred node wins." Maintain last_modified on each write to make decisions consistent.
  • DELETE

    • Deletes usually do not conflict, but ensure they are replicated and applied in the desired order relative to updates.

PostgreSQL 10’s logical apply will stop a subscription on any error (e.g., unique violation). To resume, either fix the data or advance the origin LSN carefully.

  • Option 1: Correct the data on the subscriber so the conflicting change can apply, then re-enable the subscription.

    • Example: if an INSERT fails due to a unique violation, remove or modify the conflicting row on the subscriber, then run:
      • ALTER SUBSCRIPTION sub_name ENABLE;
  • Option 2: Skip ahead with pg_replication_origin_advance. Use with caution.

    • Determine the current origin LSN:
      • SELECT * FROM pg_replication_origin_status; -- remote_lsn column
    • Advance the replication origin:
      • SELECT pg_replication_origin_advance('sub_name', '0/XXXXXXXX');

Careless advancement can introduce data divergence; apply only when you fully understand the implications.

Loop prevention strategy

Bidirectional replication can produce infinite cycles if a change replicated from node A to B is then replicated back to A, and so on. A simple approach is to tag rows and adjust them upon apply.

  • Add metadata columns to replicated tables:
    • Boolean flag that marks whether a given change originated locally
    • Optional origin identifier for traceability across nodes
  • Use a BEFORE trigger on INSERT/UPDATE that:
    • Marks locally-initiated writes as local
    • Marks remotely-applied writes as non-local and prevents re-applying a change that already bounced through a peer

Trigger logic

Below is an implementation that detects logical apply workers via application_name and adjusts a per-row flag. Returning NULL on a replicated change that already carries a "not local" mark prevents the second-hop from writing back, which breaks the loop.

CREATE OR REPLACE FUNCTION prevent_cycle() RETURNS trigger AS $$
DECLARE
  app_name text := current_setting('application_name', true);
BEGIN
  -- Non-replication (client) backends initiate local DML
  IF app_name IS NULL OR app_name !~* '^logical replication worker' THEN
    NEW.from_local := true;
  ELSE
    -- DML originating from a logical apply worker
    IF NEW.from_local IS NOT DISTINCT FROM false THEN
      -- Already marked non-local by a peer; drop this write to stop the bounce
      RETURN NULL;
    ELSE
      -- First arrival from a peer; accept but mark as non-local
      NEW.from_local := false;
    END IF;
  END IF;

  -- Keep a fresh timestamp for conflict-resolution policies
  NEW.touched_at := clock_timestamp();
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

End-to-end setup

Environment

Three PostgreSQL 10 instances on localhost:

  • Ports: 1922, 1923, 1924

Schema and sequences

Replicated table with metadata for loop control and conflict policy.

-- Same DDL on all nodes
CREATE TABLE items (
  id          int PRIMARY KEY DEFAULT nextval('node_seq'),
  payload     text,
  created_at  timestamp,
  from_local  boolean,
  touched_at  timestamp DEFAULT clock_timestamp()
);

Use per-node sequences with fixed step to avoid PK collisions.

-- Node on port 1922
CREATE SEQUENCE node_seq INCREMENT BY 16 START WITH 1;

-- Node on port 1923
CREATE SEQUENCE node_seq INCREMENT BY 16 START WITH 2;

-- Node on port 1924
CREATE SEQUENCE node_seq INCREMENT BY 16 START WITH 3;

Trigger and activation

-- Same on all nodes
CREATE TRIGGER items_prevent_cycle
BEFORE INSERT OR UPDATE ON items
FOR EACH ROW EXECUTE PROCEDURE prevent_cycle();

-- Fire trigger on both local and replication apply
ALTER TABLE items ENABLE ALWAYS TRIGGER items_prevent_cycle;

Publications

-- Same on all nodes
CREATE PUBLICATION pub_items FOR TABLE items WITH (publish = 'insert, update, delete');
-- Alternatively, add tables later:
-- ALTER PUBLICATION pub_items ADD TABLE items;

Subscriptions

Create peer-to-peer subscriptions to form a full mesh.

-- On node 1922
CREATE SUBSCRIPTION sub_1922_1923
  CONNECTION 'host=127.0.0.1 port=1923 user=postgres dbname=postgres'
  PUBLICATION pub_items;
CREATE SUBSCRIPTION sub_1922_1924
  CONNECTION 'host=127.0.0.1 port=1924 user=postgres dbname=postgres'
  PUBLICATION pub_items;

-- On node 1923
CREATE SUBSCRIPTION sub_1923_1922
  CONNECTION 'host=127.0.0.1 port=1922 user=postgres dbname=postgres'
  PUBLICATION pub_items;
CREATE SUBSCRIPTION sub_1923_1924
  CONNECTION 'host=127.0.0.1 port=1924 user=postgres dbname=postgres'
  PUBLICATION pub_items;

-- On node 1924
CREATE SUBSCRIPTION sub_1924_1922
  CONNECTION 'host=127.0.0.1 port=1922 user=postgres dbname=postgres'
  PUBLICATION pub_items;
CREATE SUBSCRIPTION sub_1924_1923
  CONNECTION 'host=127.0.0.1 port=1923 user=postgres dbname=postgres'
  PUBLICATION pub_items;

Workload generation

To avoid PK conflicts while stressing updates, use staggered ID spaces and overlapping UPDATE ranges.

test1.sql (target node: 1922)

\set vid random(1,300000)
INSERT INTO items (id, payload, created_at)
SELECT 3*(random()*100000)::int, 't', now()
ON CONFLICT (id) DO UPDATE SET payload = 'x', touched_at = clock_timestamp();
UPDATE items SET payload = 'u', touched_at = clock_timestamp() WHERE id = :vid;

test2.sql (target node: 1923)

\set vid random(1,300000)
INSERT INTO items (id, payload, created_at)
SELECT 3*(random()*100000)::int + 1, 't', now()
ON CONFLICT (id) DO UPDATE SET payload = 'x', touched_at = clock_timestamp();
UPDATE items SET payload = 'u', touched_at = clock_timestamp() WHERE id = :vid;

test3.sql (target node: 1924)

\set vid random(1,300000)
INSERT INTO items (id, payload, created_at)
SELECT 3*(random()*100000)::int + 2, 't', now()
ON CONFLICT (id) DO UPDATE SET payload = 'x', touched_at = clock_timestamp();
UPDATE items SET payload = 'u', touched_at = clock_timestamp() WHERE id = :vid;

Run the load for each node:

pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 10 -j 10 -T 10 -p 1922
pgbench -M prepared -n -r -P 1 -f ./test2.sql -c 10 -j 10 -T 10 -p 1923
pgbench -M prepared -n -r -P 1 -f ./test3.sql -c 10 -j 10 -T 10 -p 1924

Validation

A quick consistency check compares a hash aggregate across nodes.

-- Run on each node
SELECT SUM(hashtext((id::text || payload || created_at::text))) FROM items;

If the sums match across all three nodes, the replicated state is aligned.

Delete propagation test

Delete on any node and verify it propagates once without bouncing:

-- Execute on node 1922
DELETE FROM items WHERE id IN (SELECT id FROM items ORDER BY id LIMIT 10);

-- After replication catches up, confirm deletes exist on 1923 and 1924
SELECT COUNT(*) FROM items;
Tags: postgresql

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.