System DesignApril 3, 202510 min read

System Design: Distributed Message Queue from Scratch

Designing a Kafka-like message queue — partitioning, consumer groups, offset management, at-least-once vs exactly-once delivery, and where each guarantee breaks down.

System DesignKafkaDistributed SystemsMessage QueueBackend

Understanding how Kafka works internally makes you a dramatically better user of it. Let me walk through building a simplified distributed message queue — the design decisions reveal exactly why Kafka makes the trade-offs it does.

Core Concepts

  • Topic: a named stream of messages, split into partitions for parallelism
  • Partition: an append-only, ordered log stored on disk. The unit of parallelism.
  • Consumer Group: set of consumers that jointly consume a topic — each partition assigned to exactly one consumer in the group
  • Offset: position of a message within a partition — consumers commit offsets to track progress

Producer in Spring Boot

OrderEventProducer.java
java
1@Component
2@RequiredArgsConstructor
3public class OrderEventProducer {
4
5    private final KafkaTemplate<String, OrderEvent> kafkaTemplate;
6
7    public void publish(OrderEvent event) {
8        kafkaTemplate.send("orders", event.getOrderId(), event)
9            .whenComplete((result, ex) -> {
10                if (ex != null) {
11                    log.error("Failed to publish order event: {}", event.getOrderId(), ex);
12                    // push to retry queue or dead letter topic
13                } else {
14                    log.info("Order event sent to partition {} offset {}",
15                        result.getRecordMetadata().partition(),
16                        result.getRecordMetadata().offset());
17                }
18            });
19    }
20}

Consumer with Manual Offset Commit

OrderEventConsumer.java
java
1@KafkaListener(topics = "orders", groupId = "order-processor",
2               containerFactory = "manualAckFactory")
3public void consume(ConsumerRecord<String, OrderEvent> record,
4                    Acknowledgment ack) {
5    try {
6        orderService.process(record.value());
7        ack.acknowledge(); // commit only after successful processing
8    } catch (RetryableException e) {
9        // don't ack — message will be redelivered
10        log.warn("Retryable error, skipping ack for offset {}", record.offset());
11    } catch (Exception e) {
12        log.error("Fatal error, sending to DLT", e);
13        deadLetterTemplate.send("orders.DLT", record.value());
14        ack.acknowledge(); // ack to move past the bad message
15    }
16}

At-least-once delivery means your consumers must be idempotent. Use a processed_events table with a UNIQUE constraint on event_id — duplicate processing becomes a no-op instead of a bug.

More in System Design