Real time chat looks simple on the surface. A user types a message, it appears on the other person's screen instantly. How hard can it be? Harder than almost any other feature you will build.
Chat involves persistent connections, message ordering guarantees, delivery receipts, presence detection, typing indicators, offline message queuing, push notifications, media attachments, and eventually search across millions of messages. Each of these is a meaningful engineering challenge on its own. Together, they form one of the most complex distributed systems a product team will tackle.
We have built chat systems for collaboration platforms, marketplace applications, and customer support tools. The architecture decisions you make in the first month determine whether the system scales gracefully or collapses under load at 10,000 concurrent users.
The Connection Layer: WebSockets and Beyond
HTTP is request response. The client asks, the server answers. Chat needs bidirectional, persistent communication. The server needs to push messages to the client the instant they arrive, without the client asking for them.
WebSockets are the standard solution. A WebSocket connection starts as an HTTP request, upgrades to a persistent TCP connection, and stays open for the duration of the session. Both sides can send data at any time. Latency is typically under 50 milliseconds for message delivery.
The challenge is scale. Every connected user holds an open WebSocket connection. At 10,000 concurrent users, that is 10,000 persistent TCP connections your servers need to maintain simultaneously. At 100,000 users, you need a cluster of WebSocket servers with a message routing layer between them. A single Node.js process can handle roughly 50,000 to 100,000 concurrent WebSocket connections depending on message throughput, so you are looking at horizontal scaling fairly early. We cover the fundamentals of real time infrastructure in our real time architecture guide.
Server Sent Events (SSE) are a simpler alternative for scenarios where the server pushes to the client but the client sends messages via standard HTTP POST. SSE uses regular HTTP, works through most proxies and load balancers without special configuration, and automatically reconnects on failure. For simple notification or feed update use cases, SSE is often the better choice. For full bidirectional chat with typing indicators and presence, WebSockets are still the right tool.
Connection management is critical. Users will lose connectivity, switch between WiFi and cellular, close their laptop lid and reopen it an hour later. Your WebSocket layer needs heartbeat mechanisms (ping/pong frames every 30 seconds), automatic reconnection with exponential backoff, and session resumption so users do not miss messages sent while they were briefly disconnected.
Message Storage and Ordering
Storing chat messages seems straightforward until you realize the edge cases. Two users send messages at the exact same millisecond. A message is sent while the recipient is offline and needs to be delivered later. A user edits a message that has already been read. A message includes an image that is still uploading when the text portion is sent.
Message ordering is the first hard problem. Wall clock time is unreliable because client clocks drift. Server timestamps are better but still problematic when messages arrive at different servers in a cluster. The solution we use is a combination of server assigned sequence numbers per conversation and logical timestamps. Each conversation has a monotonically increasing sequence counter. When a message is persisted, it gets the next sequence number. Clients render messages in sequence order, which guarantees consistent ordering for all participants.
Storage architecture depends on your scale. For products with up to a few million messages, PostgreSQL handles chat storage well. A messages table with conversation_id, sender_id, sequence_number, content, and created_at, indexed properly, can query the last 50 messages in a conversation in under 5 milliseconds. We discuss efficient database schema patterns in detail separately.
At higher scale (hundreds of millions of messages), you may need to partition messages across multiple database instances, typically sharded by conversation_id. Some teams move to purpose built databases like ScyllaDB or Cassandra for message storage while keeping PostgreSQL for user data and conversation metadata. We have found that most products do not need this complexity until they are well past 50 million messages.
Message delivery guarantees matter. Your system should guarantee at least once delivery, meaning a message might be delivered twice but never zero times. The client deduplicates using the message ID. This is simpler and more reliable than trying to achieve exactly once delivery, which is notoriously difficult in distributed systems.
Presence and Typing Indicators
Presence (showing who is online) and typing indicators are features users expect but that create significant architectural complexity.
Presence requires tracking the connection state of every user across every WebSocket server in your cluster. When User A connects to Server 1, Server 2 needs to know about it so it can show User A as online to User B who is connected to Server 2. This requires a shared presence store, typically Redis with pub/sub.
The naive approach is to update presence on every WebSocket connect and disconnect event. The problem is that connections drop and reconnect frequently, creating a "flicker" effect where users appear to go offline and come back online every few seconds. The fix is a grace period: when a disconnect is detected, wait 15 to 30 seconds before marking the user as offline. If they reconnect within that window (which they usually do on mobile networks), the offline event is never published.
Typing indicators are ephemeral, they do not need persistence. When a user starts typing, the client sends a "typing" event to the server, which broadcasts it to other participants in the conversation. The event includes a timeout (typically 3 to 5 seconds). If no new typing event arrives within that window, the indicator disappears. This is a fire and forget system that should not use the same reliability guarantees as message delivery. If a typing indicator is missed, nothing bad happens.
Push Notifications and Offline Delivery
When a user is not connected via WebSocket, messages still need to reach them. This is where push notifications and offline message queuing come in.
Offline message queue. When a message is sent and the recipient has no active WebSocket connection, the message is persisted to the database (it always is, regardless of connection status) and a push notification is triggered. When the user reconnects, the client requests all messages with a sequence number higher than the last one it received. This "catch up" mechanism ensures no messages are lost during offline periods.
Push notification delivery uses APNs for iOS and FCM for Android. The notification payload should be lightweight, containing the conversation ID, sender name, and a truncated message preview. The full message content loads when the user opens the app and the client syncs from the server. Rate limiting push notifications is important: if a user receives 50 messages while offline, they should get one summary notification ("12 new messages from 3 conversations"), not 50 individual notifications.
Scaling to Hundreds of Thousands of Users
At scale, the architecture needs several additional layers:
WebSocket server cluster with a message bus. Each WebSocket server handles its own set of connections. When a message needs to be delivered, the server publishes it to a message bus (Redis pub/sub, NATS, or Kafka). Every WebSocket server subscribes to channels for the conversations its connected users are part of. This decouples message production from message delivery and lets you scale WebSocket servers horizontally. We discuss cloud and scaling architecture in detail on our services page.
Connection routing. A load balancer distributes WebSocket connections across your server cluster. Unlike HTTP load balancing, WebSocket connections are sticky, they stay on one server for their duration. You need a load balancer that supports WebSocket upgrades and connection draining (gracefully moving connections during deployments).
Read receipts and message status. Tracking whether a message was delivered and read adds bidirectional state management. The pattern we use: the server marks a message as "delivered" when the WebSocket server confirms the client received it. The client marks it as "read" when the message enters the viewport. These status updates are batched and sent every few seconds to avoid overwhelming the server with individual acknowledgments.
Build Versus Buy
Before building chat from scratch, consider whether an existing solution fits. Products like Stream, Sendbird, and PubNub provide chat SDKs that handle WebSocket management, message storage, presence, and push notifications. They cost $0.01 to $0.08 per monthly active user at scale.
The build versus buy decision depends on how central chat is to your product. If chat is a supporting feature (like a messaging tab in a marketplace app), a managed SDK saves months of development. If chat is your core product (a team collaboration tool, a customer support platform), you likely need the architectural control that comes with building it yourself.
On projects we have shipped, we have done both. For our marketplace projects, we have integrated managed chat SDKs to save time. For products where messaging is the primary experience, we built custom systems that gave us full control over the data model, the UX, and the scaling characteristics.
Getting Started
If you are adding real time chat to your product, start with the minimum viable implementation: WebSocket connections (using a library like Socket.IO or native WebSocket with a reconnection wrapper), a messages table in your existing database, and basic presence tracking with Redis. This gets you to a working chat feature in 3 to 5 weeks.
Scale the architecture as usage demands it. You do not need Kafka and a sharded message database at 1,000 users. But you do need the abstractions in place so you can swap in those systems later without rewriting the client.
If you need help building chat infrastructure that will scale with your product, reach out to us. We have shipped real time systems at scale and we know where the complexity hides.