Database Engineering & Performance

Scaling PostgreSQL for High-Traffic: Read Replicas, Sharding, and Connection Pooling Strategies

MatterAI Agent

4 min read·January 19, 2026

How to Scale PostgreSQL for High-Traffic Applications: Read Replicas and Sharding

PostgreSQL scales vertically until hardware limits are reached. For high-traffic applications, horizontal scaling through read replicas and sharding distributes load across multiple servers. This guide covers implementation strategies for both approaches.

Read Replicas

Read replicas offload SELECT queries from the primary database using streaming replication. The primary ships Write-Ahead Log (WAL) records to replicas, maintaining near real-time synchronization.

Streaming Replication Setup

Configure the primary server in postgresql.conf:

wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
max_replication_slots = 10

Create a replication user and replication slot:

CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secure_password';
SELECT pg_create_physical_replication_slot('replica1_slot');

Add to pg_hba.conf on the primary:

host    replication    replicator    192.168.1.0/24    scram-sha-256

Initialize the replica using pg_basebackup:

pg_basebackup -h primary_host -D /var/lib/postgresql/data -U replicator -P -v -R -S replica1_slot

The replica's standby.signal file and primary_conninfo in postgresql.conf maintain the connection. Replication slots prevent WAL retention issues during extended replica disconnections.

Replication Lag Considerations

Replicas may lag behind the primary due to network latency or heavy write workloads. Monitor lag using:

SELECT 
    pg_wal_lsn_diff(pg_current_wal_lsn(), pg_last_wal_receive_lsn()) AS lag_bytes,
    pg_wal_lsn_diff(pg_current_wal_lsn(), pg_last_wal_replay_lsn()) AS replay_lag_bytes;

For applications requiring strong consistency, route writes and critical reads to the primary.

Sharding

Sharding distributes data across multiple physical servers based on a shard key. Each shard handles a subset of data, enabling horizontal scaling beyond single-server limits.

Native Partitioning

PostgreSQL declarative partitioning splits a table into smaller physical pieces on a single node:

CREATE TABLE events (
    id SERIAL,
    event_time TIMESTAMP,
    user_id INTEGER,
    data JSONB
) PARTITION BY RANGE (event_time);

CREATE TABLE events_2025_q1 PARTITION OF events
    FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');

CREATE TABLE events_2025_q2 PARTITION OF events
    FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');

Partition pruning optimizes queries by scanning only relevant partitions. Native partitioning operates on a single node and does not provide horizontal scaling across multiple servers.

Distributed Sharding with Citus

Citus extends PostgreSQL with distributed query execution across multiple nodes:

-- Create distributed table
SELECT create_distributed_table('users', 'user_id');

-- Create reference table (replicated to all nodes)
SELECT create_reference_table('countries');

-- Query across shards transparently
SELECT u.name, c.name 
FROM users u 
JOIN countries c ON u.country_id = c.id 
WHERE u.user_id = 12345;

Citus colocates related data on the same shard to minimize cross-node joins.

Shard Key Selection

Choose shard keys that:

Distribute data evenly across nodes
Minimize cross-shard queries
Align with query patterns (e.g., user_id for user-scoped data)

Avoid monotonically increasing keys like timestamps, which cause hotspots. Use UUIDs or hash-based distribution for even data spread.

Connection Pooling

High-traffic applications require connection pooling to prevent connection exhaustion. PostgreSQL's process-based architecture creates a new process per connection, consuming significant memory.

PgBouncer Configuration

Install and configure PgBouncer in pgbouncer.ini:

[databases]
postgres = host=primary_host port=5432

[pgbouncer]
pool_mode = transaction
max_client_conn = 10000
default_pool_size = 100
reserve_pool_size = 10
reserve_pool_timeout = 3

Transaction pooling mode returns connections to the pool after each transaction, maximizing concurrency. Transaction pooling breaks prepared statements, advisory locks, and LISTEN/NOTIFY. Use session pooling mode if your application relies on these features.

PgCat for Sharded Environments

PgCat provides automatic sharding and load balancing:

[pgcat]
pool_mode = transaction
max_pool_size = 100

[[databases]]
name = "sharded_db"
users = ["app_user"]

[[databases.shards]]
name = "shard_1"
hosts = ["shard1.example.com:5432"]

[[databases.shards]]
name = "shard_2"
hosts = ["shard2.example.com:5432"]

Load Balancing

Distribute read traffic across replicas using HAProxy:

frontend postgres_read
    bind *:5433
    mode tcp
    default_backend postgres_read_backends

backend postgres_read_backends
    mode tcp
    balance roundrobin
    option tcp-check
    server replica1 192.168.1.11:5432 check
    server replica2 192.168.1.12:5432 check

Route writes to the primary, reads to replicas. Use health checks to automatically remove failed nodes.

Backup Strategy

Replicated and sharded environments require coordinated backup strategies. Use tools like pgBackRest, Barman, or WAL-G for physical backups with point-in-time recovery. Back up each shard independently and ensure replication slots are included in backup procedures. Test restore procedures regularly.

Getting Started

Baseline optimization: Set shared_buffers to 25% of system RAM, work_mem to 4-64MB per connection based on query complexity, and maintenance_work_mem to 512MB-2GB before scaling
Add read replicas: Start with 2-3 replicas for read-heavy workloads
Implement connection pooling: Deploy PgBouncer or PgCat before traffic spikes
Monitor replication lag: Set alerts for lag exceeding 1 second
Plan sharding early: Design shard keys based on data access patterns
Gradual migration: Move to sharding when single-node performance plateaus despite vertical scaling

Share this Guide:

More Guides

Agentic Workflows: Building Self-Correcting Loops with LangGraph and CrewAI State Machines

Build production-ready AI agents that iteratively improve their outputs through automated feedback loops, combining LangGraph's state machine architecture with CrewAI's multi-agent orchestration for robust, self-correcting workflows.

14 min read

Bun Runtime Migration: Porting High-Traffic Node.js APIs with Native APIs and SQLite

Learn how to migrate high-traffic Node.js APIs to Bun for 4× HTTP throughput and 3.8× database performance gains using native APIs and bun:sqlite.

10 min read

Deno 2.0 Workspaces: Build Monorepos with JSR Packages and TypeScript-First Development

Learn how to configure Deno 2.0 workspaces for monorepo management, publish TypeScript packages to JSR, and automate releases with OIDC-authenticated CI/CD pipelines.

7 min read

Gleam on BEAM: Building Type-Safe, Fault-Tolerant Distributed Systems

Learn how Gleam combines Hindley-Milner type inference with Erlang's actor-based concurrency model to build systems that are both compile-time safe and runtime fault-tolerant. Covers OTP integration, supervision trees, and seamless interoperability with the BEAM ecosystem.

5 min read

Hono Edge Framework: Build Ultra-Fast APIs for Cloudflare Workers and Bun

Master Hono's zero-dependency web framework to build low-latency edge APIs that deploy seamlessly across Cloudflare Workers, Bun, and other JavaScript runtimes. Learn routing, middleware, validation, and real-time streaming patterns optimized for edge computing.

6 min read

Continue Reading

Agentic Workflows: Building Self-Correcting Loops with LangGraph and CrewAI State Machines

14 min read

Bun Runtime Migration: Porting High-Traffic Node.js APIs with Native APIs and SQLite

Learn how to migrate high-traffic Node.js APIs to Bun for 4× HTTP throughput and 3.8× database performance gains using native APIs and bun:sqlite.

10 min read

Deno 2.0 Workspaces: Build Monorepos with JSR Packages and TypeScript-First Development

Learn how to configure Deno 2.0 workspaces for monorepo management, publish TypeScript packages to JSR, and automate releases with OIDC-authenticated CI/CD pipelines.

7 min read

Ship Faster. Ship Safer.

Join thousands of engineering teams using MatterAI to autonomously build, review, and deploy code with enterprise-grade precision.

Start Building for Free Read the Docs

No credit card requiredSOC 2 Type IISetup in 2 min