Build High-Performance APIs: Caching, Connection Pooling, and Query Optimization
How to Build High-Performance APIs: Caching Strategies, Connection Pooling, and Query Optimization
High-performance APIs require systematic optimization across three layers: application caching, database connection management, and query execution efficiency. This guide covers proven patterns and implementations for each.
Caching Strategies
Caching reduces database load and latency by storing frequently accessed data in fast memory. The cache-aside pattern is the most common approach: check cache first, query database on miss, then populate cache.
Cache Key Design
Use structured keys to prevent collisions and enable efficient invalidation:
// Namespacing pattern
const cacheKey = `app:env:user:${id}`;
// Set-based bulk invalidation (preferred over wildcards)
const userSetKey = `app:env:user:ids`;
await redis.sadd(userSetKey, id);
Avoid key collisions by including type and version identifiers. Use consistent encoding (URL-safe base64 for complex values).
Cache-Aside Pattern with Distributed Stampede Protection
async function getUser(id) {
const cacheKey = `user:${id}`;
const lockKey = `user:${id}:lock`;
const lockTTL = 10; // seconds
// Check cache
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
// Distributed lock to prevent stampede across instances
const lockAcquired = await redis.set(lockKey, '1', 'NX', 'EX', lockTTL);
if (lockAcquired) {
try {
// Double-check cache after acquiring lock
const doubleCheck = await redis.get(cacheKey);
if (doubleCheck) return JSON.parse(doubleCheck);
const user = await db.query('SELECT * FROM users WHERE id = ?', [id]);
await redis.setex(cacheKey, 3600, JSON.stringify(user));
return user;
} finally {
await redis.del(lockKey);
}
} else {
// Wait for lock holder to populate cache
await new Promise(resolve => setTimeout(resolve, 100));
return getUser(id); // Retry
}
}
Refresh-After-Write Caching
Refresh-after-write updates cache after database write, ensuring eventual consistency with minimal write latency.
async function updateUser(id, data) {
await db.query('UPDATE users SET ? WHERE id = ?', [data, id]);
try {
const user = await db.query('SELECT * FROM users WHERE id = ?', [id]);
await redis.setex(`user:${id}`, 3600, JSON.stringify(user));
return user;
} catch (err) {
console.error('Cache update failed:', err);
throw err;
}
}
Cache Invalidation Strategies
TTL-based: Automatic expiration after fixed time.
Event-based: Invalidate on data changes via pub/sub.
// Publisher (on write)
redis.publish('user:invalidation', JSON.stringify({ userId: id }));
// Subscriber (on startup)
redis.subscribe('user:invalidation');
redis.on('message', async (channel, message) => {
try {
const { userId } = JSON.parse(message);
await redis.del(`user:${userId}`);
} catch (err) {
console.error('Invalidation failed:', err);
}
});
Set-based bulk invalidation (avoids blocking wildcard operations):
async function invalidateAllUsers() {
const userSetKey = `app:env:user:ids`;
const userIds = await redis.smembers(userSetKey);
// Use SCAN-based deletion to avoid blocking
for (const id of userIds) {
await redis.del(`user:${id}`);
}
await redis.del(userSetKey);
}
// Alternative: SCAN-based pattern deletion (non-blocking)
async function deleteByPattern(pattern) {
let cursor = '0';
do {
const result = await redis.scan(cursor, 'MATCH', pattern, 'COUNT', 100);
cursor = result[0];
const keys = result[1];
if (keys.length > 0) {
await redis.del(...keys);
}
} while (cursor !== '0');
}
Version-based: Include version keys in cache entries.
async function getUserWithVersion(id) {
const versionKey = `user:${id}:version`;
const version = await redis.get(versionKey) || '0';
const cacheKey = `user:${id}:v${version}`;
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
const user = await db.query('SELECT * FROM users WHERE id = ?', [id]);
await redis.setex(cacheKey, 3600, JSON.stringify(user));
return user;
}
async function invalidateUser(id) {
const versionKey = `user:${id}:version`;
await redis.incr(versionKey);
}
Cache Serialization Considerations
Handle serialization edge cases:
function safeSerialize(obj) {
return JSON.stringify(obj, (key, value) => {
if (value instanceof Date) return { __type: 'Date', value: value.toISOString() };
if (typeof value === 'bigint') return { __type: 'BigInt', value: value.toString() };
return value;
});
}
function safeDeserialize(str) {
return JSON.parse(str, (key, value) => {
if (value?.__type === 'Date') return new Date(value.value);
if (value?.__type === 'BigInt') return BigInt(value.value);
return value;
});
}
Avoid circular references in cached objects. Use libraries like flatted for complex structures.
Cache Consistency Guarantees
Cache and database updates are not atomic. Acceptable timing windows:
- Read-after-write: Cache may serve stale data for TTL duration
- Write-through: Stronger consistency but higher write latency
- Event-based invalidation: Near real-time consistency with pub/sub latency
For critical data, use database transactions with cache invalidation in the same operation scope.
Cache Eviction Policies
Redis supports multiple eviction strategies via maxmemory-policy:
- LRU (Least Recently Used):
allkeys-lruorvolatile-lru- evicts least recently accessed keys - LFU (Least Frequently Used):
allkeys-lfuorvolatile-lfu- evicts least frequently accessed keys (Redis 4.0+) - TTL-based:
volatile-ttl- evicts keys with shortest TTL first - FIFO (First In First Out): Not natively supported; implement with sorted sets if needed
- Random:
allkeys-random- simple but less predictable
Choose based on access patterns: LRU for general use, LFU for hot data retention, TTL for time-sensitive data.
Cache Partitioning and Sharding
For large-scale deployments, partition cache data across multiple Redis instances:
function getShardKey(key, shardCount) {
const hash = crypto.createHash('md5').update(key).digest('hex');
return parseInt(hash.substring(0, 8), 16) % shardCount;
}
const redisShards = [
redis.createClient({ host: 'redis-0.example.com' }),
redis.createClient({ host: 'redis-1.example.com' }),
redis.createClient({ host: 'redis-2.example.com' })
];
async function getFromCache(key) {
const shardIndex = getShardKey(key, redisShards.length);
return redisShards[shardIndex].get(key);
}
Benefits: horizontal scaling, reduced memory pressure per instance, improved throughput. Trade-offs: cross-shard operations require coordination, rebalancing complexity.
Cache Compression
Compress large objects to reduce memory usage and network bandwidth:
const zlib = require('zlib');
async function setCompressed(key, value, ttl) {
const serialized = JSON.stringify(value);
const compressed = zlib.gzipSync(serialized);
await redis.setex(key, ttl, compressed);
}
async function getCompressed(key) {
const compressed = await redis.getBuffer(key);
if (!compressed) return null;
try {
const decompressed = zlib.gunzipSync(compressed);
return JSON.parse(decompressed.toString());
} catch (err) {
console.error('Decompression failed:', err);
return null;
}
}
Use compression for objects >1KB. Monitor compression ratio; disable if ratio <1.5x.
Cache Backup and Persistence
Redis persistence strategies:
- RDB (Snapshot): Point-in-time snapshots, faster restart, potential data loss between snapshots
- AOF (Append-Only File): Every write logged, better durability, larger file size
- Hybrid: RDB + AOF for balance
Configuration:
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec
For critical data, enable Redis replication with automatic failover (Redis Sentinel or Redis Cluster).
Cache Warming
Pre-populate cache during application startup. For distributed systems, coordinate warming across instances:
async function warmUserCache() {
try {
const activeUsers = await db.query('SELECT * FROM users WHERE active = 1 LIMIT 1000');
const pipeline = redis.pipeline();
activeUsers.forEach(user => {
pipeline.setex(`user:${user.id}`, 3600, JSON.stringify(user));
});
await pipeline.exec();
} catch (err) {
console.error('Cache warming failed:', err);
throw err;
}
}
// Distributed coordination with TTL refresh
async function distributedWarmup() {
const lockKey = 'warmup:lock';
const lockValue = `${process.pid}:${Date.now()}`;
const lockTTL = 300;
const refreshInterval = 60;
const acquired = await redis.set(lockKey, lockValue, 'NX', 'EX', lockTTL);
if (acquired) {
let refreshTimer;
try {
// Refresh lock periodically to prevent expiration
refreshTimer = setInterval(async () => {
const current = await redis.get(lockKey);
if (current === lockValue) {
await redis.expire(lockKey, lockTTL);
}
}, refreshInterval * 1000);
await warmUserCache();
} finally {
clearInterval(refreshTimer);
const current = await redis.get(lockKey);
if (current === lockValue) {
await redis.del(lockKey);
}
}
}
}
Cache Size Planning
Calculate memory allocation:
total_memory = avg_object_size * expected_keys * replication_factor
Use maxmemory and maxmemory-policy (allkeys-lru or volatile-lru) in Redis configuration. Monitor memory usage and set alerts at 80% capacity.
Multi-Level Caching
L1 cache (in-memory) + L2 cache (Redis) provides optimal hit rates. L1 has nanosecond access but limited capacity; L2 offers distributed storage with microsecond latency.
Connection Pooling
Connection pools maintain reusable database connections, eliminating the overhead of establishing new connections for each request.
Pool Configuration
const pool = mysql.createPool({
host: 'localhost',
user: 'root',
connectionLimit: 100, // Max concurrent connections
queueLimit: 0, // 0 = fail-fast when pool full, >0 = queue requests
waitForConnections: true,
idleTimeout: 60000, // Close idle connections after 60s
minIdle: 10, // Maintain minimum idle connections
maxLifetime: 1800000, // Recycle connections after 30 minutes
enableKeepAlive: true
});
// Use pool.query() for automatic connection release
const result = await pool.query('SELECT * FROM users WHERE id = ?', [id]);
// For transactions, manually manage connection lifecycle
const conn = await pool.getConnection();
try {
await conn.beginTransaction();
await conn.query('UPDATE accounts SET balance = balance - ? WHERE id = ?', [amount, fromId]);
await conn.query('UPDATE accounts SET balance = balance + ? WHERE id = ?', [amount, toId]);
await conn.commit();
} catch (err) {
await conn.rollback();
throw err;
} finally {
conn.release(); // Always release connection
}
Key Parameters
- connectionLimit: Maximum connections (typically 2-4x CPU cores)
- minIdle: Minimum idle connections maintained (reduces connection establishment latency)
- connectionTimeout: Max wait for available connection
- idleTimeout: Duration before idle connections are closed
- maxLifetime: Maximum connection lifespan before recycling (prevents stale connections)
- queueLimit: 0 rejects immediately when pool full (fail-fast), >0 queues requests (adds latency)
Pool Sizing Formula
connectionLimit = core_count * (2 + effective_spindle_count)
For cloud databases, use 2-4 connections per CPU core as a starting point. The HikariCP formula with spindle count applies to traditional disk-based databases, not modern cloud instances.
Pool Warm-Up
Initialize connections on application startup with error handling:
async function warmUpPool(pool) {
const promises = [];
for (let i = 0; i < pool.config.connectionLimit; i++) {
promises.push(pool.query('SELECT 1').catch(err => {
console.error(`Warmup connection ${i} failed:`, err);
}));
}
try {
await Promise.all(promises);
console.log('Pool warmup completed');
} catch (err) {
console.error('Pool warmup had errors:', err);
// Continue startup even if some warmup queries fail
}
}
Connection Leak Monitoring
Track connection lifecycle to detect leaks:
const activeConnections = new Map();
pool.on('acquire', connection => {
const stack = new Error().stack;
activeConnections.set(connection.threadId, { acquired: Date.now(), stack });
});
pool.on('release', connection => {
activeConnections.delete(connection.threadId);
});
// Alert on long-held connections
setInterval(() => {
const threshold = 30000; // 30 seconds
for (const [id, info] of activeConnections) {
const held = Date.now() - info.acquired;
if (held > threshold) {
console.warn(`Connection ${id} held for ${held}ms`, info.stack);
}
}
}, 10000);
Read Replica Routing with Health Checks
const primaryPool = mysql.createPool({ host: 'primary-db.example.com', ... });
const replicaPools = [
{ pool: mysql.createPool({ host: 'replica-1.example.com', ... }), healthy: true },
{ pool: mysql.createPool({ host: 'replica-2.example.com', ... }), healthy: true }
];
let replicaIndex = 0;
// Health check interval
setInterval(async () => {
for (const replica of replicaPools) {
try {
await replica.pool.query('SELECT 1');
replica.healthy = true;
} catch (err) {
replica.healthy = false;
console.error(`Replica unhealthy: ${replica.pool.config.host}`);
}
}
}, 30000);
function getPool(isWrite = false) {
if (isWrite) return primaryPool;
const healthyReplicas = replicaPools.filter(r => r.healthy);
if (healthyReplicas.length === 0) {
console.warn('All replicas unhealthy, falling back to primary');
return primaryPool;
}
// True round-robin replica selection
const index = replicaIndex % healthyReplicas.length;
replicaIndex++;
return healthyReplicas[index].pool;
}
// Usage
await getPool(true).query('INSERT INTO users ...'); // Writes to primary
await getPool(false).query('SELECT * FROM users'); // Reads from healthy replica
Read/Write Splitting Strategies
Sticky Sessions: Route all requests from a user session to the same replica to ensure consistency during the session. Useful for read-your-writes consistency.
Eventual Consistency: Accept that replicas may lag behind primary by milliseconds to seconds. Use for analytics, reporting, and non-critical reads.
Hybrid Approach: Use primary for critical reads immediately after writes, replicas for stale-tolerant reads.
async function getUserAfterWrite(userId, justWritten = false) {
// Use primary if data was just written to ensure consistency
const pool = justWritten ? primaryPool : getPool(false);
return pool.query('SELECT * FROM users WHERE id = ?', [userId]);
}
Query Optimization
Efficient queries reduce CPU, memory, and I/O overhead. Optimization focuses on index usage, execution plans, and query patterns.
Prepared Statements
Prepared statements improve performance and security by parsing query plans once and reusing them:
// Prepare statement once
const getUserStmt = await pool.prepare('SELECT * FROM users WHERE id = ?');
// Execute multiple times efficiently
for (const id of userIds) {
const [user] = await getUserStmt.execute([id]);
}
// Close when done
getUserStmt.unprepare();
Benefits: reduced parsing overhead, protection against SQL injection, better plan caching. Use for frequently executed queries with parameter variations.
Indexing Strategies
-- Composite index for WHERE + ORDER BY
CREATE INDEX idx_user_status_created ON users(status, created_at);
-- Covering index (includes all queried columns)
CREATE INDEX idx_orders_covering ON orders(user_id, status, total);
Execution Plan Analysis
PostgreSQL and MySQL 8.0.18+:
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 123 AND status = 'pending'
ORDER BY created_at DESC LIMIT 10;
MySQL versions prior to 8.0.18:
EXPLAIN
SELECT * FROM orders
WHERE user_id = 123 AND status = 'pending'
ORDER BY created_at DESC LIMIT 10;
Look for Seq Scan (bad) vs Index Scan (good), and check actual vs estimated rows.
Database-Specific Optimizations
PostgreSQL:
- Run
VACUUM ANALYZEregularly to reclaim space and update statistics - Configure
work_memfor sort/hash operations - Use
EXPLAIN (ANALYZE, BUFFERS)for detailed I/O analysis - Enable
pg_stat_statementsfor query performance tracking
MySQL:
- Tune
innodb_buffer_pool_sizeto 70-80% of available RAM - Configure
innodb_log_file_sizefor write-heavy workloads - Use
innodb_flush_log_at_trx_commit = 2for performance (with durability trade-off) - Enable
slow_query_logto identify problematic queries
N+1 Query Prevention
Use JOINs or IN clauses instead of loops:
// Bad: N+1 queries
for (const user of users) {
const orders = await db.query('SELECT * FROM orders WHERE user_id = ?', [user.id]);
}
// Good: Single query with IN clause
const userIds = users.map(u => u.id);
const orders = await db.query('SELECT * FROM orders WHERE user_id IN (?)', [userIds]);
// Map orders back to users
const ordersByUser = new Map();
orders.forEach(order => {
if (!ordersByUser.has(order.user_id)) {
ordersByUser.set(order.user_id, []);
}
ordersByUser.get(order.user_id).push(order);
});
// Attach orders to users
users.forEach(user => {
user.orders = ordersByUser.get(user.id) || [];
});
Query Patterns
- Offset Pagination: Use
OFFSET+LIMITfor small datasets - Keyset Pagination: More efficient for large datasets, avoids offset scanning. Requires indexed, unique cursor column.
-- Keyset pagination example (requires indexed, unique cursor column)
SELECT * FROM orders
WHERE id > ? AND user_id = ?
ORDER BY id ASC
LIMIT 50;
- Batch operations: Group inserts/updates into single transactions
- Query result caching: Cache query results for read-heavy, infrequently changing data
async function getCachedQuery(cacheKey, queryFn, ttl = 300) {
try {
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
const result = await queryFn();
await redis.setex(cacheKey, ttl, JSON.stringify(result));
return result;
} catch (err) {
console.error('Cached query failed:', err);
throw err;
}
}
Rate Limiting
Protect API performance and prevent abuse with rate limiting:
const rateLimit = new Map();
async function checkRateLimit(clientId, limit, windowMs) {
const now = Date.now();
const windowStart = now - windowMs;
let clientData = rateLimit.get(clientId);
if (!clientData) {
clientData = { requests: [], blockedUntil: 0 };
rateLimit.set(clientId, clientData);
}
// Check if currently blocked
if (clientData.blockedUntil > now) {
throw new Error('Rate limit exceeded');
}
// Clean old requests
clientData.requests = clientData.requests.filter(t => t > windowStart);
// Check limit
if (clientData.requests.length >= limit) {
clientData.blockedUntil = now + windowMs;
throw new Error('Rate limit exceeded');
}
clientData.requests.push(now);
}
// Usage with Redis for distributed rate limiting
async function checkRateLimitRedis(clientId, limit, windowMs) {
const key = `ratelimit:${clientId}`;
const now = Date.now();
const windowStart = now - windowMs;
const pipeline = redis.pipeline();
pipeline.zremrangebyscore(key, 0, windowStart);
pipeline.zcard(key);
pipeline.zadd(key, now, `${now}-${Math.random()}`);
pipeline.pexpire(key, windowMs);
const results = await pipeline.exec();
const count = results[1][1];
if (count >= limit) {
throw new Error('Rate limit exceeded');
}
}
Strategies: token bucket, leaky bucket, fixed window, sliding window. Choose based on traffic patterns and consistency requirements.
Monitoring Metrics
Track these metrics for production observability:
Collection Implementation:
const promClient = require('prom-client');
const cacheHitRate = new promClient.Counter({
name: 'cache_hits_total',
help: 'Total cache hits'
});
const cacheMissRate = new promClient.Counter({
name: 'cache_misses_total',
help: 'Total cache misses'
});
const poolActiveConnections = new promClient.Gauge({
name: 'pool_active_connections',
help: 'Active database connections'
});
const poolIdleConnections = new promClient.Gauge({
name: 'pool_idle_connections',
help: 'Idle database connections'
});
// Instrument cache operations
async function getCached(key) {
const cached = await redis.get(key);
if (cached) {
cacheHitRate.inc();
return JSON.parse(cached);
}
cacheMissRate.inc();
return null;
}
// Instrument pool using public API
setInterval(() => {
try {
poolActiveConnections.set(pool.pool._allConnections.length);
poolIdleConnections.set(pool.pool._allConnections.length - pool.pool._freeConnections.length);
} catch (err) {
console.error('Pool metrics collection failed:', err);
}
}, 5000);
Key Metrics:
- Cache: Hit rate (>80% target), miss rate, eviction count, memory usage, error rate
- Connection Pool: Active connections, idle connections, waiting requests, wait time, connection errors
- Database: Query latency (p50, p95, p99), slow query count, lock wait time, replica lag
Getting Started
- Implement Redis caching with cache-aside pattern and distributed stampede protection for read-heavy endpoints
- Configure connection pool with 2-4 connections per CPU core using
connectionLimit, includeminIdleandmaxLifetime - Add indexes on all WHERE, JOIN, and ORDER BY columns
- Run EXPLAIN (MySQL) or EXPLAIN ANALYZE (PostgreSQL/MySQL 8+) on slow queries (>100ms)
- Monitor cache hit rate (target >80%) and pool utilization (target <70%)
- Set up alerts for connection timeouts, cache evictions, and Redis failures
- Implement cache warming for frequently accessed data during cold starts
- Add health checks for read replicas with fallback to primary
- Use prepared statements for frequently executed queries
- Configure database-specific settings (buffer pool, VACUUM, slow query log)
- Implement rate limiting to protect API performance under load
Share this Guide:
More Guides
Database Performance Tuning: Master Indexing Strategies and Query Optimization Techniques
Learn how to minimize I/O latency and CPU cycles through effective indexing strategies like B-Tree and Hash indexes, covering indexes, and composite indexes. Master query optimization techniques including SARGable predicates, execution plan analysis, join optimization, and keyset pagination.
3 min readBuilding Resilient Distributed Systems: Circuit Breakers, Bulkheads, and Retry Patterns Explained
Master three essential patterns to prevent cascading failures and maintain system stability. Learn how to implement circuit breakers, bulkheads, and retry strategies with practical JavaScript examples.
5 min readRedis vs Memcached vs Hazelcast: The Ultimate Distributed Caching Guide
Compare Redis, Memcached, and Hazelcast architectures, features, and use cases to choose the right distributed caching solution for your application's performance and scalability needs.
4 min readMessage Queue Patterns: P2P, Pub/Sub, and Request-Reply Explained
Master asynchronous communication by comparing Point-to-Point, Publish-Subscribe, and Request-Reply patterns with practical code examples and reliability strategies.
3 min readWebSockets vs SSE vs WebRTC: Choosing the Right Real-Time Protocol
Compare WebSockets, Server-Sent Events, and WebRTC to choose the best protocol for your real-time application needs. Includes implementation examples, architecture comparisons, and security best practices.
5 min readContinue Reading
Database Performance Tuning: Master Indexing Strategies and Query Optimization Techniques
Learn how to minimize I/O latency and CPU cycles through effective indexing strategies like B-Tree and Hash indexes, covering indexes, and composite indexes. Master query optimization techniques including SARGable predicates, execution plan analysis, join optimization, and keyset pagination.
3 min readBuilding Resilient Distributed Systems: Circuit Breakers, Bulkheads, and Retry Patterns Explained
Master three essential patterns to prevent cascading failures and maintain system stability. Learn how to implement circuit breakers, bulkheads, and retry strategies with practical JavaScript examples.
5 min readRedis vs Memcached vs Hazelcast: The Ultimate Distributed Caching Guide
Compare Redis, Memcached, and Hazelcast architectures, features, and use cases to choose the right distributed caching solution for your application's performance and scalability needs.
4 min readReady to Supercharge Your Development Workflow?
Join thousands of engineering teams using MatterAI to accelerate code reviews, catch bugs earlier, and ship faster.
