PostgreSQL Performance Optimization: From Slow to Fast
PostgreSQL Performance Optimization: From Slow to Fast
Poor database performance is one of the most common issues we encounter. This guide covers practical optimization techniques that deliver measurable results.
Understanding the Performance Problem
Before optimizing, you need to measure:
Key Metrics to Track
- Query execution time: Target <100ms for user-facing queries
- Connection pool usage: Should stay below 80% capacity
- Database CPU/Memory: Consistent high usage indicates problems
- Disk I/O: High IOPS can bottleneck performance
- Cache hit ratio: Should be >99% for most workloads
Enable Query Logging
-- Log slow queries (>500ms)
ALTER SYSTEM SET log_min_duration_statement = 500;
-- Log all queries with their duration
ALTER SYSTEM SET log_duration = on;
-- Apply changes
SELECT pg_reload_conf();
Finding Slow Queries
Using pg_stat_statements
-- Enable pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find top 10 slowest queries
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
Real-World Example
Before optimization:
SELECT users.*, COUNT(orders.id) as order_count
FROM users
LEFT JOIN orders ON users.id = orders.user_id
WHERE users.created_at > '2024-01-01'
GROUP BY users.id
ORDER BY order_count DESC;
-- Execution time: 8,500ms
-- Rows scanned: 1,200,000
After optimization (see techniques below): 850ms (10x faster)
Optimization Technique #1: Indexing
Index Strategy
Not all indexes are created equal. Here's our systematic approach:
-- Check which indexes are actually being used
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
Common Indexing Patterns
1. Composite Indexes (Order Matters):
-- Bad: Separate indexes
CREATE INDEX idx_user_created ON users(created_at);
CREATE INDEX idx_user_status ON users(status);
-- Good: Composite index (most selective column first)
CREATE INDEX idx_user_status_created ON users(status, created_at);
-- Query that benefits:
SELECT * FROM users
WHERE status = 'active'
AND created_at > '2024-01-01';
2. Partial Indexes (For Common Filters):
-- Only index active users (90% of data is inactive)
CREATE INDEX idx_active_users ON users(created_at)
WHERE status = 'active';
-- Index size: 100MB instead of 1GB
3. Expression Indexes (For Computed Values):
-- For case-insensitive email lookups
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Now this query uses the index:
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
Optimization Technique #2: Query Rewriting
Problem: N+1 Queries
Bad (100 queries):
// Fetches users
const users = await db.query('SELECT * FROM users LIMIT 100');
// Makes 100 additional queries
for (const user of users) {
user.orders = await db.query(
'SELECT * FROM orders WHERE user_id = $1',
[user.id]
);
}
Good (1 query):
const result = await db.query(`
SELECT
users.*,
json_agg(orders.*) as orders
FROM users
LEFT JOIN orders ON orders.user_id = users.id
GROUP BY users.id
LIMIT 100
`);
Problem: SELECT *
Bad:
SELECT * FROM users WHERE id = 123;
-- Returns 50 columns, including large JSONB fields
Good:
SELECT id, email, first_name, last_name FROM users WHERE id = 123;
-- Returns only needed columns
Impact: 10KB → 1KB response size
Problem: Inefficient JOINs
Bad:
SELECT users.*, orders.*
FROM users
JOIN orders ON users.id = orders.user_id
WHERE users.email = '[email protected]';
-- Scans entire orders table
Good:
SELECT users.*, orders.*
FROM users
JOIN orders ON orders.user_id = users.id
WHERE users.email = '[email protected]';
-- Uses orders.user_id index
Optimization Technique #3: Connection Pooling
The Problem
Creating new database connections is expensive:
- Connection time: 50-100ms each
- Memory overhead: 10MB per connection
- Max connections: Usually 100-500
The Solution: pgBouncer
# pgbouncer.ini
[databases]
myapp = host=localhost dbname=production
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 3
Configuration in Application
// Without pooling (Bad)
const { Client } = require('pg');
const client = new Client({ connectionString: process.env.DATABASE_URL });
// With pooling (Good)
const { Pool } = require('pg');
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
Optimization Technique #4: Caching
Query Result Caching
const redis = require('redis');
const client = redis.createClient();
async function getUser(userId) {
const cacheKey = `user:${userId}`;
// Check cache first
const cached = await client.get(cacheKey);
if (cached) return JSON.parse(cached);
// Query database
const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);
// Cache for 5 minutes
await client.setex(cacheKey, 300, JSON.stringify(user));
return user;
}
Materialized Views
For complex aggregations that don't need real-time data:
-- Create materialized view
CREATE MATERIALIZED VIEW user_statistics AS
SELECT
users.id,
users.email,
COUNT(DISTINCT orders.id) as total_orders,
SUM(orders.amount) as total_spent,
MAX(orders.created_at) as last_order_date
FROM users
LEFT JOIN orders ON orders.user_id = users.id
GROUP BY users.id, users.email;
-- Create index on materialized view
CREATE INDEX idx_user_stats_id ON user_statistics(id);
-- Refresh periodically (run via cron)
REFRESH MATERIALIZED VIEW CONCURRENTLY user_statistics;
Performance: Complex query from 5s to 10ms
Optimization Technique #5: Partitioning
For large tables (>10M rows), partitioning can dramatically improve performance:
-- Create partitioned table
CREATE TABLE orders (
id BIGSERIAL,
user_id INTEGER,
created_at TIMESTAMP,
amount DECIMAL
) PARTITION BY RANGE (created_at);
-- Create partitions
CREATE TABLE orders_2024_q1 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE orders_2024_q2 PARTITION OF orders
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
-- Queries automatically use correct partition
SELECT * FROM orders
WHERE created_at BETWEEN '2024-02-01' AND '2024-02-28';
-- Only scans orders_2024_q1 partition
Optimization Technique #6: VACUUM and ANALYZE
PostgreSQL requires regular maintenance:
-- Manual vacuum (reclaim space)
VACUUM ANALYZE users;
-- Check table bloat
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size,
n_dead_tup,
n_live_tup,
round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) as dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
Enable autovacuum (should be on by default):
ALTER TABLE users SET (autovacuum_vacuum_scale_factor = 0.1);
ALTER TABLE users SET (autovacuum_analyze_scale_factor = 0.05);
Real-World Case Study
Client: E-commerce platform with 5M users
Problems:
- Homepage load time: 8 seconds
- Checkout timeout errors: 15% of transactions
- Database CPU: 95% constantly
Solutions Applied:
- Added 12 strategic indexes (composite + partial)
- Implemented Redis caching for product catalog
- Partitioned orders table by month
- Set up pgBouncer with transaction pooling
- Created materialized views for analytics dashboard
Results:
- Homepage load time: 1.2 seconds (85% improvement)
- Checkout errors: <1% (93% reduction)
- Database CPU: 35% average
- Cost savings: $2,400/month (smaller RDS instance)
Monitoring Setup
Use pg_stat_statements + Grafana dashboard:
-- Query for Prometheus/Grafana
SELECT
query,
calls,
mean_exec_time,
max_exec_time,
rows
FROM pg_stat_statements
WHERE mean_exec_time > 100
ORDER BY mean_exec_time DESC;
Quick Wins Checklist
✅ Enable pg_stat_statements ✅ Log queries >500ms ✅ Add indexes for WHERE/JOIN columns ✅ Use connection pooling ✅ Cache frequently accessed data ✅ Avoid SELECT * ✅ Fix N+1 queries ✅ Monitor cache hit ratio ✅ Set up autovacuum ✅ Review EXPLAIN ANALYZE for slow queries
Conclusion
Database optimization is iterative:
- Measure current performance
- Identify bottlenecks with pg_stat_statements
- Optimize queries and indexes
- Monitor results
- Repeat
Most performance issues can be resolved with proper indexing and query optimization before needing to scale hardware.
Need help optimizing your database? Schedule a performance audit or download our PostgreSQL optimization checklist.