Notification System Architecture: Email, Push, and In App

Veld Systems||8 min read

Notifications seem simple until you actually build them. Send an email when something happens. Show a badge in the app. Fire a push notification to the phone. Three lines in a product spec that turn into one of the most architecturally complex subsystems in any application.

We have built notification systems for SaaS platforms, marketplaces, and consumer apps. The pattern is always the same: v1 is a few hardcoded email sends, v2 adds push and in app, v3 is a rewrite because v2 became unmaintainable. This post covers how to skip the rewrite by designing the architecture correctly from the start.

Why Notifications Get Complicated

A notification is not a single thing. It is a triggering event, a routing decision, a user preference check, a template render, and a delivery attempt across potentially multiple channels. When you hardcode "send email when X happens" throughout your codebase, you end up with notification logic scattered across dozens of files, no consistent way to manage user preferences, and no visibility into what was sent, when, or whether it was delivered.

The complexity multiplies with scale. A product with 20 notification types across 3 channels (email, push, in app) has 60 potential delivery paths. Each path needs its own template, its own delivery mechanism, and its own failure handling. Add user preferences (Customer A wants email for everything but push only for urgent items) and you have a combinatorial explosion that ad hoc code cannot manage.

The Core Architecture

Every well designed notification system follows the same pattern: Event, Router, Channel Adapters, and Delivery.

Events are the triggers. A new comment, a payment failure, a teammate invitation. These are facts about what happened in your system. Events should be emitted from your business logic as structured data, containing the event type, the relevant entity IDs, the actor who caused it, and any context needed for rendering.

The Router receives events and decides what to do with them. It determines which users should be notified, checks their preferences to see which channels are enabled, and creates delivery jobs for each active channel. The router is the brain of the system and the only place where notification logic should live.

Channel Adapters handle the specifics of each delivery channel. The email adapter renders an HTML template and sends via your email provider. The push adapter formats the payload and sends via APNs or FCM. The in app adapter writes a record to the notifications table. Each adapter knows only about its own channel and has no knowledge of the others.

Delivery is the actual send, with retry logic, failure handling, and delivery confirmation. Each channel has different reliability characteristics. Email is fire and forget with bounce tracking. Push can fail silently if the device token is stale. In app is a database write that is highly reliable but means nothing if the user never opens the app.

Designing the Event System

Events are the foundation. Get them right and everything downstream is cleaner. We use a simple structure for every notification event:

```

{

"event_type": "comment.created",

"actor_id": "user_abc",

"entity_type": "comment",

"entity_id": "comment_xyz",

"recipients": ["user_def", "user_ghi"],

"data": {

"comment_body": "Looks great, ship it.",

"post_title": "Q4 Roadmap",

"post_id": "post_123"

}

}

```

The key design decisions here are separating the actor from the recipients (the person who caused the event should almost never receive the notification) and including all rendering data in the event payload so downstream processors do not need to query the database. This makes the system faster and more resilient, since a notification can be rendered and delivered even if the originating service is temporarily unavailable.

Events should be published to a queue or event bus, not processed synchronously. If a user action triggers 5 notifications to 5 users across 3 channels, that is 15 delivery jobs. You do not want that blocking the user's request. Use a job queue like BullMQ, a message broker like RabbitMQ, or even a simple database backed queue to process notification events asynchronously.

User Preferences

Notification preferences are deceptively tricky. The simple version is a per user, per channel toggle: "Email me about new comments: yes/no." The reality is that users want granular control without being overwhelmed by options.

The model we recommend is a three tier preference system. First, global channel preferences: a user can disable an entire channel (e.g., "no push notifications at all"). Second, category preferences: notifications grouped into categories (e.g., "Comments," "Billing," "Team activity") with per channel toggles for each category. Third, mandatory notifications: some notifications like security alerts, billing failures, and legal notices cannot be disabled and bypass all preferences.

Store preferences as a structured document per user. When the router processes an event, it maps the event type to a category, checks the user's preferences for that category and channel, and only creates a delivery job if the preference allows it.

Template Management

Notification templates are another area that starts simple and grows unruly. Each notification type needs a template for each channel, and each channel has different constraints. Emails can be rich HTML with images and buttons. Push notifications are limited to a title and a short body. In app notifications need a structured format that your frontend can render consistently.

We use a template registry where each notification type declares its templates for each channel. Templates are stored as version controlled files (not in the database) so they can be reviewed in pull requests. Variables are injected from the event payload using a simple template syntax.

For email specifically, use a base layout that handles your brand header, footer, and responsive design, then each notification type fills in the content block. This ensures visual consistency and means you only need to update one file when your brand changes. We talked about similar structuring principles in our API design guide, the concept of consistent patterns applies equally to notification templates.

Delivery Guarantees and Failure Handling

Each channel has different failure modes, and your architecture needs to handle all of them.

Email failures include bounces (permanent, remove the address), soft bounces (temporary, retry later), spam complaints (stop sending immediately), and rate limits from your provider. Use webhook callbacks from your email provider (SendGrid, SES, Postmark) to track delivery status and update your records.

Push notification failures include expired device tokens (remove them), rate limiting by Apple or Google, and payload size limits. Always validate device tokens periodically and remove stale ones. A common mistake is accumulating thousands of dead tokens and wasting API calls trying to deliver to devices that no longer exist.

In app notifications rarely fail since they are database writes, but they have their own challenge: read state management. You need to track which notifications a user has seen, support "mark all as read," and efficiently query for unread counts. Index on (user_id, read_at) and use a cursor based pagination approach rather than offset pagination for the notification feed.

For all channels, implement retry with exponential backoff. A failed email send should retry after 30 seconds, then 1 minute, then 5 minutes, up to a maximum number of attempts. Use a dead letter queue for notifications that fail all retries so you can investigate patterns without losing data.

Scaling Considerations

Notification systems generate high write volumes. A single user action can create dozens of notification records and delivery jobs. At scale, this becomes a significant database and queue load.

Batch delivery is essential for email. If a user receives 10 comment notifications in 5 minutes, send a single digest email rather than 10 separate emails. Implement a short delay (e.g., 2 minutes) before sending email notifications, and during that window, batch any additional notifications of the same type into a single email. Your users will thank you, and your email provider will not throttle you.

Fan out optimization matters when one event notifies many users. A message posted in a channel with 500 members means 500 notification records and potentially 1,500 delivery jobs (3 channels). Use bulk insert operations for in app notifications and batch API calls for push notifications. APNs and FCM both support batch sends.

Database considerations include partitioning the notifications table by user ID or time range, setting up automated cleanup of old read notifications (anything older than 90 days is rarely accessed), and using materialized unread counts rather than computing them on every page load.

Real Time In App Notifications

In app notifications are most valuable when they arrive in real time. If a user has your app open and someone comments on their post, that notification should appear immediately, not on the next page refresh.

Use WebSockets or Server Sent Events to push notifications to connected clients. When the in app channel adapter writes a notification to the database, it also publishes to a real time channel scoped to the recipient user. The client subscribes to their channel on page load and renders new notifications as they arrive. We covered the infrastructure patterns for this in our real time architecture guide.

Observability

A notification system without observability is a notification system you cannot trust. Track these metrics at minimum: delivery rate per channel (what percentage of sends succeed), latency from event to delivery per channel, preference coverage (what percentage of users have customized their preferences), and engagement (open rates for email, click rates for push, read rates for in app).

Set up alerts for delivery rate drops. If your email delivery rate falls below 95%, something is wrong, either a provider issue, a spike in bounces, or a template rendering error. Catch it early before customers start complaining about missing notifications. Our monitoring and observability guide covers the broader patterns for production system health.

Build or Buy

There are third party notification services like Knock, Novu, and OneSignal that handle much of this complexity. For early stage products, these can be a reasonable choice. They handle channel routing, preference management, and delivery, and they let you focus on your core product.

The trade off is cost at scale and flexibility. Notification services charge per notification, and when you are sending millions per month, the cost adds up. You also lose fine grained control over batching logic, template rendering, and delivery timing. For products where notifications are a core part of the user experience, like a system architecture that depends on timely alerts, building in house gives you the control you need.

Our recommendation: use a third party service to validate your notification UX, then build in house when notification volume exceeds 500,000 per month or when you need custom logic that the service cannot accommodate. Either way, design your events and router layer cleanly so swapping the delivery backend is a contained change.

If you are building a product that needs a reliable, multi channel notification system, or you are dealing with a notification mess that needs to be untangled, reach out to us to talk through the architecture.

Ready to Build?

Let us talk about your project

We take on 3-4 projects at a time. Get an honest assessment within 24 hours.