System DesignJune 2, 20259 min read

System Design: URL Shortener Like bit.ly

Designing a URL shortener end-to-end — Base62 encoding, hash collisions, database choice, caching layer, and how to handle 100M+ URLs at scale.

System DesignRedisPostgreSQLScalabilityBackend

URL shorteners are a classic system design question because they touch almost every important concept: encoding schemes, database trade-offs, caching, and horizontal scaling. Let me walk through a design that can handle 100M+ URLs.

Core Algorithm: Base62 Encoding

The short code is a Base62 encoding of a unique ID. With 7 characters you get 62^7 ≈ 3.5 trillion unique URLs — more than enough. The ID comes from a distributed ID generator (Snowflake or a DB sequence).

Base62Encoder.java
java
1public class Base62Encoder {
2
3    private static final String CHARS =
4        "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
5
6    public static String encode(long id) {
7        StringBuilder sb = new StringBuilder();
8        while (id > 0) {
9            sb.append(CHARS.charAt((int)(id % 62)));
10            id /= 62;
11        }
12        // Pad to 7 chars and reverse
13        while (sb.length() < 7) sb.append('0');
14        return sb.reverse().toString();
15    }
16
17    public static long decode(String shortCode) {
18        long id = 0;
19        for (char c : shortCode.toCharArray()) {
20            id = id * 62 + CHARS.indexOf(c);
21        }
22        return id;
23    }
24}

Database and Caching Strategy

  • Write path: Generate ID → encode to Base62 → INSERT into PostgreSQL → return short URL
  • Read path: Parse short code → Redis cache lookup → if miss, query PostgreSQL → cache result with 24h TTL
  • Cache hit rate is 80%+ in practice — the same short URLs get clicked repeatedly
  • Use read replicas for the DB — reads are 100× more frequent than writes

Don't use MD5/SHA hash of the long URL as the short code — you'll get collisions and you'd need to store the full URL to detect them. Auto-increment ID + Base62 is simpler and collision-free.

Scaling to 100M URLs

At scale the bottleneck is the ID generator, not the DB. Use a Snowflake-style ID: 41 bits for timestamp, 10 bits for machine ID, 12 bits for sequence. This gives 4096 IDs/ms/machine with no coordination needed.

Spring Boot Implementation

UrlShortenerService.java
java
1@Service
2@RequiredArgsConstructor
3public class UrlShortenerService {
4
5    private final UrlRepository urlRepo;
6    private final RedisTemplate<String, String> redis;
7    private final SnowflakeIdGenerator idGen;
8
9    public String shorten(String longUrl) {
10        // Check if already shortened
11        String existing = urlRepo.findShortCodeByLongUrl(longUrl);
12        if (existing != null) return buildShortUrl(existing);
13
14        long id = idGen.nextId();
15        String shortCode = Base62Encoder.encode(id);
16
17        urlRepo.save(new UrlMapping(shortCode, longUrl, LocalDateTime.now()));
18        redis.opsForValue().set("url:" + shortCode, longUrl, Duration.ofDays(1));
19
20        return buildShortUrl(shortCode);
21    }
22
23    public String resolve(String shortCode) {
24        // Try cache first
25        String cached = redis.opsForValue().get("url:" + shortCode);
26        if (cached != null) return cached;
27
28        String longUrl = urlRepo.findLongUrlByShortCode(shortCode)
29            .orElseThrow(() -> new ResourceNotFoundException("Short URL", shortCode));
30
31        redis.opsForValue().set("url:" + shortCode, longUrl, Duration.ofDays(1));
32        return longUrl;
33    }
34}

Redirect Controller

RedirectController.java
java
1@RestController
2public class RedirectController {
3
4    private final UrlShortenerService shortener;
5    private final ClickAnalyticsService analytics;
6
7    @GetMapping("/{shortCode}")
8    public ResponseEntity<Void> redirect(@PathVariable String shortCode,
9                                          HttpServletRequest req) {
10        String longUrl = shortener.resolve(shortCode);
11
12        // Fire-and-forget analytics
13        analytics.record(shortCode, req.getRemoteAddr(), req.getHeader("User-Agent"));
14
15        return ResponseEntity.status(HttpStatus.FOUND)
16            .location(URI.create(longUrl))
17            .build();
18    }
19}

Load Estimation

  • Write QPS: 100M URLs / (365 days × 86400s) ≈ 3 writes/sec — trivial, single DB handles this.
  • Read QPS: 100:1 read/write ratio = ~300 reads/sec at launch, scales to 30K/sec at 100× growth.
  • Storage: 100M URLs × 500 bytes avg = 50GB — fits on a single PostgreSQL instance for years.
  • Cache: top 20% of URLs get 80% of traffic — caching 20M entries in Redis (≈10GB) covers most reads.

Use 301 (Permanent Redirect) for SEO — browsers cache it and won't hit your server again. Use 302 (Temporary Redirect) if you want to track every click in analytics, since browsers won't cache it.

More in System Design