Scaling PostgreSQL for High-Traffic: Read Replicas, Sharding, and Connection Pooling Strategies
How to Scale PostgreSQL for High-Traffic Applications: Read Replicas and Sharding
PostgreSQL scales vertically until hardware limits are reached. For high-traffic applications, horizontal scaling through read replicas and sharding distributes load across multiple servers. This guide covers implementation strategies for both approaches.
Read Replicas
Read replicas offload SELECT queries from the primary database using streaming replication. The primary ships Write-Ahead Log (WAL) records to replicas, maintaining near real-time synchronization.
Streaming Replication Setup
Configure the primary server in postgresql.conf:
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
max_replication_slots = 10
Create a replication user and replication slot:
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secure_password';
SELECT pg_create_physical_replication_slot('replica1_slot');
Add to pg_hba.conf on the primary:
host replication replicator 192.168.1.0/24 scram-sha-256
Initialize the replica using pg_basebackup:
pg_basebackup -h primary_host -D /var/lib/postgresql/data -U replicator -P -v -R -S replica1_slot
The replica's standby.signal file and primary_conninfo in postgresql.conf maintain the connection. Replication slots prevent WAL retention issues during extended replica disconnections.
Replication Lag Considerations
Replicas may lag behind the primary due to network latency or heavy write workloads. Monitor lag using:
SELECT
pg_wal_lsn_diff(pg_current_wal_lsn(), pg_last_wal_receive_lsn()) AS lag_bytes,
pg_wal_lsn_diff(pg_current_wal_lsn(), pg_last_wal_replay_lsn()) AS replay_lag_bytes;
For applications requiring strong consistency, route writes and critical reads to the primary.
Sharding
Sharding distributes data across multiple physical servers based on a shard key. Each shard handles a subset of data, enabling horizontal scaling beyond single-server limits.
Native Partitioning
PostgreSQL declarative partitioning splits a table into smaller physical pieces on a single node:
CREATE TABLE events (
id SERIAL,
event_time TIMESTAMP,
user_id INTEGER,
data JSONB
) PARTITION BY RANGE (event_time);
CREATE TABLE events_2025_q1 PARTITION OF events
FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
CREATE TABLE events_2025_q2 PARTITION OF events
FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');
Partition pruning optimizes queries by scanning only relevant partitions. Native partitioning operates on a single node and does not provide horizontal scaling across multiple servers.
Distributed Sharding with Citus
Citus extends PostgreSQL with distributed query execution across multiple nodes:
-- Create distributed table
SELECT create_distributed_table('users', 'user_id');
-- Create reference table (replicated to all nodes)
SELECT create_reference_table('countries');
-- Query across shards transparently
SELECT u.name, c.name
FROM users u
JOIN countries c ON u.country_id = c.id
WHERE u.user_id = 12345;
Citus colocates related data on the same shard to minimize cross-node joins.
Shard Key Selection
Choose shard keys that:
- Distribute data evenly across nodes
- Minimize cross-shard queries
- Align with query patterns (e.g.,
user_idfor user-scoped data)
Avoid monotonically increasing keys like timestamps, which cause hotspots. Use UUIDs or hash-based distribution for even data spread.
Connection Pooling
High-traffic applications require connection pooling to prevent connection exhaustion. PostgreSQL's process-based architecture creates a new process per connection, consuming significant memory.
PgBouncer Configuration
Install and configure PgBouncer in pgbouncer.ini:
[databases]
postgres = host=primary_host port=5432
[pgbouncer]
pool_mode = transaction
max_client_conn = 10000
default_pool_size = 100
reserve_pool_size = 10
reserve_pool_timeout = 3
Transaction pooling mode returns connections to the pool after each transaction, maximizing concurrency. Transaction pooling breaks prepared statements, advisory locks, and LISTEN/NOTIFY. Use session pooling mode if your application relies on these features.
PgCat for Sharded Environments
PgCat provides automatic sharding and load balancing:
[pgcat]
pool_mode = transaction
max_pool_size = 100
[[databases]]
name = "sharded_db"
users = ["app_user"]
[[databases.shards]]
name = "shard_1"
hosts = ["shard1.example.com:5432"]
[[databases.shards]]
name = "shard_2"
hosts = ["shard2.example.com:5432"]
Load Balancing
Distribute read traffic across replicas using HAProxy:
frontend postgres_read
bind *:5433
mode tcp
default_backend postgres_read_backends
backend postgres_read_backends
mode tcp
balance roundrobin
option tcp-check
server replica1 192.168.1.11:5432 check
server replica2 192.168.1.12:5432 check
Route writes to the primary, reads to replicas. Use health checks to automatically remove failed nodes.
Backup Strategy
Replicated and sharded environments require coordinated backup strategies. Use tools like pgBackRest, Barman, or WAL-G for physical backups with point-in-time recovery. Back up each shard independently and ensure replication slots are included in backup procedures. Test restore procedures regularly.
Getting Started
- Baseline optimization: Set
shared_buffersto 25% of system RAM,work_memto 4-64MB per connection based on query complexity, andmaintenance_work_memto 512MB-2GB before scaling - Add read replicas: Start with 2-3 replicas for read-heavy workloads
- Implement connection pooling: Deploy PgBouncer or PgCat before traffic spikes
- Monitor replication lag: Set alerts for lag exceeding 1 second
- Plan sharding early: Design shard keys based on data access patterns
- Gradual migration: Move to sharding when single-node performance plateaus despite vertical scaling
Share this Guide:
More Guides
Agentic Workflows: Building Self-Correcting Loops with LangGraph and CrewAI State Machines
Build production-ready AI agents that iteratively improve their outputs through automated feedback loops, combining LangGraph's state machine architecture with CrewAI's multi-agent orchestration for robust, self-correcting workflows.
14 min readBun Runtime Migration: Porting High-Traffic Node.js APIs with Native APIs and SQLite
Learn how to migrate high-traffic Node.js APIs to Bun for 4× HTTP throughput and 3.8× database performance gains using native APIs and bun:sqlite.
10 min readDeno 2.0 Workspaces: Build Monorepos with JSR Packages and TypeScript-First Development
Learn how to configure Deno 2.0 workspaces for monorepo management, publish TypeScript packages to JSR, and automate releases with OIDC-authenticated CI/CD pipelines.
7 min readGleam on BEAM: Building Type-Safe, Fault-Tolerant Distributed Systems
Learn how Gleam combines Hindley-Milner type inference with Erlang's actor-based concurrency model to build systems that are both compile-time safe and runtime fault-tolerant. Covers OTP integration, supervision trees, and seamless interoperability with the BEAM ecosystem.
5 min readHono Edge Framework: Build Ultra-Fast APIs for Cloudflare Workers and Bun
Master Hono's zero-dependency web framework to build low-latency edge APIs that deploy seamlessly across Cloudflare Workers, Bun, and other JavaScript runtimes. Learn routing, middleware, validation, and real-time streaming patterns optimized for edge computing.
6 min readContinue Reading
Agentic Workflows: Building Self-Correcting Loops with LangGraph and CrewAI State Machines
Build production-ready AI agents that iteratively improve their outputs through automated feedback loops, combining LangGraph's state machine architecture with CrewAI's multi-agent orchestration for robust, self-correcting workflows.
14 min readBun Runtime Migration: Porting High-Traffic Node.js APIs with Native APIs and SQLite
Learn how to migrate high-traffic Node.js APIs to Bun for 4× HTTP throughput and 3.8× database performance gains using native APIs and bun:sqlite.
10 min readDeno 2.0 Workspaces: Build Monorepos with JSR Packages and TypeScript-First Development
Learn how to configure Deno 2.0 workspaces for monorepo management, publish TypeScript packages to JSR, and automate releases with OIDC-authenticated CI/CD pipelines.
7 min readShip Faster. Ship Safer.
Join thousands of engineering teams using MatterAI to autonomously build, review, and deploy code with enterprise-grade precision.
