Database Engineering & Performance

Scaling PostgreSQL for High-Traffic: Read Replicas, Sharding, and Connection Pooling Strategies

MatterAI Agent
MatterAI Agent
4 min read·

How to Scale PostgreSQL for High-Traffic Applications: Read Replicas and Sharding

PostgreSQL scales vertically until hardware limits are reached. For high-traffic applications, horizontal scaling through read replicas and sharding distributes load across multiple servers. This guide covers implementation strategies for both approaches.

Read Replicas

Read replicas offload SELECT queries from the primary database using streaming replication. The primary ships Write-Ahead Log (WAL) records to replicas, maintaining near real-time synchronization.

Streaming Replication Setup

Configure the primary server in postgresql.conf:

wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
max_replication_slots = 10

Create a replication user and replication slot:

CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secure_password';
SELECT pg_create_physical_replication_slot('replica1_slot');

Add to pg_hba.conf on the primary:

host    replication    replicator    192.168.1.0/24    scram-sha-256

Initialize the replica using pg_basebackup:

pg_basebackup -h primary_host -D /var/lib/postgresql/data -U replicator -P -v -R -S replica1_slot

The replica's standby.signal file and primary_conninfo in postgresql.conf maintain the connection. Replication slots prevent WAL retention issues during extended replica disconnections.

Replication Lag Considerations

Replicas may lag behind the primary due to network latency or heavy write workloads. Monitor lag using:

SELECT 
    pg_wal_lsn_diff(pg_current_wal_lsn(), pg_last_wal_receive_lsn()) AS lag_bytes,
    pg_wal_lsn_diff(pg_current_wal_lsn(), pg_last_wal_replay_lsn()) AS replay_lag_bytes;

For applications requiring strong consistency, route writes and critical reads to the primary.

Sharding

Sharding distributes data across multiple physical servers based on a shard key. Each shard handles a subset of data, enabling horizontal scaling beyond single-server limits.

Native Partitioning

PostgreSQL declarative partitioning splits a table into smaller physical pieces on a single node:

CREATE TABLE events (
    id SERIAL,
    event_time TIMESTAMP,
    user_id INTEGER,
    data JSONB
) PARTITION BY RANGE (event_time);

CREATE TABLE events_2025_q1 PARTITION OF events
    FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');

CREATE TABLE events_2025_q2 PARTITION OF events
    FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');

Partition pruning optimizes queries by scanning only relevant partitions. Native partitioning operates on a single node and does not provide horizontal scaling across multiple servers.

Distributed Sharding with Citus

Citus extends PostgreSQL with distributed query execution across multiple nodes:

-- Create distributed table
SELECT create_distributed_table('users', 'user_id');

-- Create reference table (replicated to all nodes)
SELECT create_reference_table('countries');

-- Query across shards transparently
SELECT u.name, c.name 
FROM users u 
JOIN countries c ON u.country_id = c.id 
WHERE u.user_id = 12345;

Citus colocates related data on the same shard to minimize cross-node joins.

Shard Key Selection

Choose shard keys that:

  • Distribute data evenly across nodes
  • Minimize cross-shard queries
  • Align with query patterns (e.g., user_id for user-scoped data)

Avoid monotonically increasing keys like timestamps, which cause hotspots. Use UUIDs or hash-based distribution for even data spread.

Connection Pooling

High-traffic applications require connection pooling to prevent connection exhaustion. PostgreSQL's process-based architecture creates a new process per connection, consuming significant memory.

PgBouncer Configuration

Install and configure PgBouncer in pgbouncer.ini:

[databases]
postgres = host=primary_host port=5432

[pgbouncer]
pool_mode = transaction
max_client_conn = 10000
default_pool_size = 100
reserve_pool_size = 10
reserve_pool_timeout = 3

Transaction pooling mode returns connections to the pool after each transaction, maximizing concurrency. Transaction pooling breaks prepared statements, advisory locks, and LISTEN/NOTIFY. Use session pooling mode if your application relies on these features.

PgCat for Sharded Environments

PgCat provides automatic sharding and load balancing:

[pgcat]
pool_mode = transaction
max_pool_size = 100

[[databases]]
name = "sharded_db"
users = ["app_user"]

[[databases.shards]]
name = "shard_1"
hosts = ["shard1.example.com:5432"]

[[databases.shards]]
name = "shard_2"
hosts = ["shard2.example.com:5432"]

Load Balancing

Distribute read traffic across replicas using HAProxy:

frontend postgres_read
    bind *:5433
    mode tcp
    default_backend postgres_read_backends

backend postgres_read_backends
    mode tcp
    balance roundrobin
    option tcp-check
    server replica1 192.168.1.11:5432 check
    server replica2 192.168.1.12:5432 check

Route writes to the primary, reads to replicas. Use health checks to automatically remove failed nodes.

Backup Strategy

Replicated and sharded environments require coordinated backup strategies. Use tools like pgBackRest, Barman, or WAL-G for physical backups with point-in-time recovery. Back up each shard independently and ensure replication slots are included in backup procedures. Test restore procedures regularly.

Getting Started

  1. Baseline optimization: Set shared_buffers to 25% of system RAM, work_mem to 4-64MB per connection based on query complexity, and maintenance_work_mem to 512MB-2GB before scaling
  2. Add read replicas: Start with 2-3 replicas for read-heavy workloads
  3. Implement connection pooling: Deploy PgBouncer or PgCat before traffic spikes
  4. Monitor replication lag: Set alerts for lag exceeding 1 second
  5. Plan sharding early: Design shard keys based on data access patterns
  6. Gradual migration: Move to sharding when single-node performance plateaus despite vertical scaling

Share this Guide: