Notification System Design: More Than Just a Queue

I've been asked to design a notification system in more interviews than I can count. An answer not worth giving: a single service that receives a call, fires a request to a delivery provider, and exits. It works until it doesn't, and it isn't interview worthy.

The more I've built, the more layers I've found. My answer today looks nothing like my first one.

The foundation

Queue-based. A caller pushes a message to a queue, a separate consumer listens, picks it up, and routes it to the right channel — email, SMS, push, in-app. Low coupling, separation of concerns, easy scaling. This is my go-to approach. For a while, I thought it was complete as is.

When the happy path ends

I used to stop at the queue design. Then I started asking: what happens when a delivery fails?

Retries first. Delivery providers might fail momentarily. We'd need exponential backoff, not hammering the endpoint. Messages that exhaust retries go to a dead-letter queue. We inspect, diagnose, fix the root cause, then requeue or dismiss. Having a DLQ but never inspecting it isn't a real strategy.

Then idempotency. If a message is retried for some reason, the notification goes out twice. The solution could be recording what's already been processed into an idempotency key table. Then, we check before we act.

Sometimes you've been implementing a pattern for a while without knowing its name. Know the names. Not because it should matter — but interviewers respond differently when you say idempotent consumer versus "I'd store processed IDs somewhere". Same idea, different signal.

Always assume worse than you've imagined

The failure mode game took me a while to get comfortable with. I thought I'd covered loose ends with the concepts above. Good systems interviewers keep pushing:

them: retries limit exceeded
you: we push to dead-letter queue, debug, re-process
them: DLQ fills, now what?

them: queue service is down
you: it's fine, messages are persisted
them: what if the messages aren't persisted, or are lost?

At some point, for a notification system — not payments, not orders — you can accept losing a message. Sometimes, knowing where that line is and stating it clearly is the answer.

The question that threw me

Someone once asked: what if we don't use queues at all?

I pushed back. Queues exist for good reasons: they decouple, buffer, and give you recovery. Why would we remove them? The conversation went in circles until I eventually learned they were describing their actual system: a legacy synchronous service-to-service call.

I should've read the room earlier, and they could've been more straightforward with the question. "What if services are already calling each other directly" is a different question than "should we use queues?" The first is a limitation, and invites solutions — circuit breakers, timeouts, fallback behavior. The second is a debate. I got into circuit breakers properly after that conversation, which was worth it regardless.

The broader lesson: if someone is steering the conversation toward an idea that seems off, there might be a reason. I've learned to ask for context before defending the ideal architecture.

On the interview itself

One thing I learned to ask early is context, any context really: Are we building real-time delivery or scheduled? Cron jobs change the design. Are there any design limitations that I should consider? If there are any assumptions made, one should share it upfront.

It took me a while to realize that even technical interviews aren't just quizzes about technical knowledge. Interviewers have their own systems, their own past decisions and preferences. When you're making a design call, say why. Otherwise, they might overlook your perspective and misplace your answer.

If you're guessing, don't let it show. I've tried giving educated guesses many times. I've discovered that some interviewers appreciate thinking out loud while others treat it as an invitation to go deeper into territory you've flagged as uncertain. Read which room you're in first.

I used to just answer whatever came at me. Now I lead where I have something to say. You can drive the conversation as much as they can.

I've come to discover that interviews are like chess, really. You're navigating a perspective, and trying to knock them down.

back to top ↑
LinkedInGitHubEmail