When a single DB instance isn't enough — horizontal sharding strategies, choosing a shard key, cross-shard queries, and the rebalancing problem nobody warns you about.
Database sharding is one of those solutions that creates as many problems as it solves. Here's an honest breakdown of when it's necessary, which strategy to pick, and the operational costs that hit you 6 months after you shard.
Range-based: shard by value ranges (user IDs 1-1M on shard1, 1M-2M on shard2). Easy to understand but creates hot spots — new users always go to the last shard. Hash-based: shard by hash(shard_key) % N. Distributes evenly but makes range queries cross all shards. Directory-based: a lookup service maps each key to a shard. Most flexible but the lookup service becomes a bottleneck.
The shard key decision is permanent — it's extremely hard to re-shard. Think carefully: pick something with high cardinality, uniform distribution, and that groups data your queries will naturally access together.
Hash-based sharding with N shards means hash % N. When you add a shard and N becomes N+1, almost every record needs to move. Consistent hashing solves this — only 1/N of records need to move when adding a node. This is why Redis Cluster, Cassandra, and DynamoDB all use consistent hashing.
Queries that span shards are the biggest pain point. A 'find all orders for user X' query is fine if you shard by user_id. But 'find all orders placed in the last 24 hours' touches every shard — this is called a scatter-gather query and it's expensive.
1-- Range partitioning by date — much simpler than sharding
2-- Try this BEFORE considering multi-server sharding
3CREATE TABLE orders (
4 id BIGSERIAL,
5 user_id BIGINT NOT NULL,
6 created_at TIMESTAMPTZ NOT NULL,
7 total NUMERIC(10,2)
8) PARTITION BY RANGE (created_at);
9
10CREATE TABLE orders_2024 PARTITION OF orders
11 FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
12
13CREATE TABLE orders_2025 PARTITION OF orders
14 FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
15
16-- PostgreSQL automatically routes queries to the right partition
17-- Partition pruning makes date-range queries 10-100× fasterMore in System Design