Video is no longer optional for most products. Whether you are building a telehealth platform, an e learning app, a marketplace with product demos, or a social feed with user generated content, media streaming is a core infrastructure decision that affects everything from your cloud bill to your user retention. Getting it wrong means buffering, dropped frames, and users who leave before the first second of playback finishes.
We have built streaming into web and mobile applications across industries, and the patterns that work at scale are well established. This post walks through the architecture decisions that matter, from ingestion to playback.
The Core Pipeline: Ingest, Transcode, Deliver
Every media streaming system follows the same basic pipeline. A video file or live stream enters the system (ingest), gets processed into multiple formats and quality levels (transcoding), gets stored on distributed infrastructure, and then gets delivered to end users through a content delivery network.
The ingestion layer handles uploads from users or cameras. For user generated content, this means handling large file uploads reliably. Chunked uploads using the tus protocol or a similar resumable upload standard are essential. A user on a mobile network should not lose a 500MB upload because they walked through a dead zone. We typically use presigned URLs to upload directly to object storage (S3 or equivalent), bypassing the application server entirely. This keeps your API servers free from handling multi gigabyte file transfers.
Transcoding is where most of the complexity lives. Raw uploaded video needs to be converted into multiple renditions at different resolutions and bitrates: 1080p, 720p, 480p, and 360p at minimum. Each rendition gets segmented into small chunks, typically 2 to 6 seconds long, packaged as HLS (HTTP Live Streaming) or DASH (Dynamic Adaptive Streaming over HTTP). HLS has won the format war for practical purposes. Safari requires it, and every other platform supports it. We use HLS as the primary format on nearly every project.
For transcoding infrastructure, you have three realistic options. AWS MediaConvert or Mux handle transcoding as a managed service, charging per minute of video processed. MediaConvert costs roughly $0.015 per minute for basic transcoding. Mux adds a layer of intelligence with automatic quality optimization and runs around $0.02 per minute of encoded video. The third option is running FFmpeg on your own infrastructure, which gives you full control but requires managing queues, scaling workers, and handling failures yourself. For most teams, managed transcoding is the right call until you are processing tens of thousands of hours monthly.
Adaptive Bitrate Streaming
Adaptive bitrate (ABR) streaming is what makes modern video feel seamless. Instead of picking a single quality level, the player monitors available bandwidth in real time and switches between renditions mid stream. A user on WiFi gets 1080p. They switch to cellular and the player drops to 480p without interrupting playback. The bandwidth improves and they get bumped back up.
This requires your transcoding pipeline to produce an HLS manifest file (m3u8) that references all available renditions. The video player on the client reads this manifest, measures download speed of each segment, and selects the appropriate rendition for the next segment. Apple's native AVPlayer handles this automatically on iOS. On the web and Android, hls.js and ExoPlayer are the standard libraries.
The key architectural decision here is segment duration. Shorter segments (2 seconds) allow faster quality switching but increase the number of HTTP requests and the overhead of segment headers. Longer segments (6 seconds) are more efficient but slower to adapt. We default to 4 second segments as a balance that works well across network conditions. This is one of the topics we cover in our system architecture practice, because getting segment duration wrong compounds at scale.
CDN Strategy and Edge Caching
Video delivery without a CDN is not viable for any real user base. A single 1080p stream consumes roughly 5 to 8 Mbps. Multiply that by concurrent viewers and your origin server bandwidth bill becomes unsustainable within days. CDNs like CloudFront, Cloudflare Stream, or Fastly cache your video segments at edge locations close to viewers, reducing origin load by 95% or more.
The caching strategy for HLS content is straightforward but important to get right. Video segments are immutable once created, so they should be cached aggressively with long TTLs (24 hours or more). Manifest files need shorter TTLs (1 to 5 seconds for live streams, longer for VOD) because they update as new segments become available. Setting cache headers incorrectly on manifest files is one of the most common causes of live stream latency and stale playback.
For global audiences, multi CDN strategies reduce risk and improve performance. Tools like Mux Data or Conviva measure real user playback quality across CDN providers and can switch traffic dynamically. This is overkill for most startups but becomes important once you are serving millions of hours monthly.
Live Streaming Architecture
Live streaming adds real time constraints that change the architecture significantly. The standard approach uses RTMP for ingest (the broadcaster sends their stream via RTMP to your ingest server), which then transcodes and packages the stream into HLS for delivery. This introduces 15 to 30 seconds of latency in the standard HLS pipeline due to segment buffering.
If you need lower latency, Low Latency HLS (LL HLS) reduces this to 2 to 5 seconds by using partial segments and HTTP/2 push. For sub second latency (live auctions, interactive broadcasts), WebRTC is the only viable option, but it does not scale the same way. WebRTC requires a Selective Forwarding Unit (SFU) like LiveKit or mediasoup to handle more than a handful of viewers. We covered related patterns in our real time architecture guide, and the same principles of connection management and state synchronization apply.
The choice between HLS latency and WebRTC latency should be driven by the product, not the technology. A fitness class with instructor interaction needs sub second latency (WebRTC). A sports broadcast with chat reactions works fine at 5 second latency (LL HLS). A recorded lecture with no live interaction should be pure VOD with no live infrastructure at all.
Mobile Considerations
Mobile streaming adds constraints around battery, bandwidth, and background playback. On iOS, AVPlayer handles HLS natively with hardware accelerated decoding. On Android, ExoPlayer (now Media3) is the standard. For React Native applications, react native video wraps these native players and supports HLS out of the box.
Background audio playback, picture in picture, AirPlay, and Chromecast support all require platform specific implementation. These features are expected by users but often forgotten in architecture planning. Budget 20 to 30% additional development time for mobile media features beyond basic playback.
Offline playback for downloaded content requires DRM. Widevine (Android, Chrome), FairPlay (iOS, Safari), and PlayReady (Windows, Edge) are the three DRM systems you need to support for broad coverage. Implementing DRM from scratch is painful. Services like BuyDRM or the DRM built into Mux handle the licensing server complexity.
Cost Architecture
Video infrastructure costs scale with three variables: storage, transcoding, and delivery bandwidth. Storage is cheap ($0.023 per GB on S3 standard). Transcoding is moderate (roughly $1 per hour of source video for multi rendition HLS output). Delivery bandwidth is where the bill gets serious. CloudFront charges $0.085 per GB for the first 10TB, and a single 1080p viewer consumes about 2 to 3 GB per hour.
The math matters. 1,000 concurrent viewers watching 1 hour of 1080p content generates roughly 2.5TB of bandwidth, costing around $212 just in CDN fees. At 10,000 concurrent viewers, you are looking at $2,100 per hour. These are the kinds of numbers we help teams plan for when building production systems, because a viral moment can turn into a five figure bill overnight. Our guide on reducing AWS cloud costs covers strategies that apply directly to video delivery optimization.
Smart cost management includes aggressive caching policies, using cheaper storage tiers for older content (S3 Intelligent Tiering), and implementing client side bandwidth detection to avoid serving 1080p to users on small mobile screens.
Monitoring Playback Quality
Shipping a video player is not enough. You need visibility into what your users actually experience. Key metrics to track include time to first frame (should be under 2 seconds), rebuffering ratio (percentage of playback time spent buffering, target under 0.5%), startup failure rate, and average bitrate delivered.
Tools like Mux Data, Bitmovin Analytics, or custom logging against your video player events provide this data. Without it, you are blind to the experience of users who silently leave because playback was poor.
When to Build vs Buy
For most products, the right answer is a combination. Use a managed transcoding service (Mux or MediaConvert) for the processing pipeline. Use a CDN for delivery. Build custom logic around upload handling, access control, and player integration that ties into your specific product. Going fully custom with FFmpeg and bare metal servers only makes sense when you are processing enormous volume and have a dedicated media engineering team.
If you are building a product with video or media streaming at its core and want architecture that scales without surprise bills, reach out to us. We have shipped these systems across verticals and can help you avoid the expensive mistakes.