File uploads are deceptively simple to prototype and deceptively hard to get right in production. Accepting a small image in a demo takes 20 minutes of code. Handling 500MB video files from unreliable mobile connections, processing them into multiple formats, storing them cost effectively, and serving them back fast to users across the world is an entirely different engineering challenge.

We have built file upload systems for document management platforms, media heavy consumer apps, and enterprise tools where files are the core product. The architecture decisions you make early determine whether your upload system scales gracefully or becomes the bottleneck that limits your entire product. This post covers the end to end architecture, from the browser to long term storage.

The Naive Approach and Why It Breaks

The simplest file upload implementation sends the file from the browser to your API server, which reads the entire file into memory (or streams it to disk) and then uploads it to cloud storage. This works for small files on fast connections, and it is how most tutorials teach file uploads.

It breaks in at least four ways at scale. Memory pressure. Each concurrent upload consumes server memory proportional to the file size. Ten users uploading 100MB files simultaneously requires 1GB of server memory just for upload buffers. Timeout risk. Large files over slow connections take minutes to upload. Your API server keeps the connection open, consuming a worker slot that cannot serve other requests. Single point of failure. If the upload fails at 95%, the user must start over from scratch. Cost. Bandwidth flows through your API server before reaching cloud storage, so you pay for compute and egress on traffic that your server does not actually need to process.

The production architecture eliminates the API server from the upload path entirely.

Signed URL Uploads: The Right Pattern

The correct architecture for file uploads uses signed URLs (also called presigned URLs). The flow works like this:

1. The client requests an upload URL from your API, providing the file name, size, and MIME type.

2. Your API validates the request (file type allowed, size within limits, user has permission), generates a signed URL pointing directly to cloud storage (S3, GCS, or Supabase Storage), and returns it to the client.

3. The client uploads the file directly to cloud storage using the signed URL. Your API server is completely out of the data path.

4. On upload completion, the client notifies your API, or a cloud storage event triggers a webhook, and your backend records the file metadata and kicks off any processing.

This pattern solves all four problems. No memory pressure on your API servers. No timeout risk because cloud storage handles the connection. Resumability is possible with multipart uploads. And bandwidth goes directly to storage without passing through your compute layer.

The signed URL should include constraints: maximum file size, allowed content type, and a short expiration (5 to 15 minutes). This prevents misuse of the upload URL even if it is intercepted. The URL is useless after expiration and cannot be used to upload files that violate your constraints.

Chunked and Resumable Uploads

For files larger than 50MB, or when your users are on unreliable connections (mobile networks, developing regions), single request uploads are not reliable enough. A network interruption at 80% of a 200MB upload means the user loses all progress and must start over. That is an unacceptable user experience.

Multipart uploads solve this by splitting the file into chunks (typically 5 to 10MB each) and uploading each chunk independently. S3 and GCS both support multipart uploads natively. The flow is:

1. Your API initiates a multipart upload and returns an upload ID.

2. The client uploads each chunk separately, receiving an ETag for each successful chunk.

3. After all chunks are uploaded, the client or your API sends a completion request that assembles the chunks into the final file.

If any chunk fails, only that chunk needs to be retried. If the user closes their browser and returns later, the upload can resume from where it left off (assuming you persist the upload ID and completed chunks list). This turns a fragile all or nothing operation into a resilient incremental one.

On the frontend, implement a progress indicator that updates per chunk completion, not just per byte. Byte level progress requires XMLHttpRequest progress events, which work but can be noisy. Chunk level progress gives clean, predictable updates: "5 of 12 chunks uploaded." Both approaches have value, and in our experience, showing both (a smooth progress bar from byte tracking and a chunk counter) provides the best user experience.

Processing Pipeline

Most uploaded files need processing before they are ready to serve. Images need resizing and format conversion. Videos need transcoding. Documents need thumbnail generation. PDFs need text extraction. This processing should happen asynchronously, never in the upload request path.

The architecture we use is an event driven processing pipeline:

1. File lands in cloud storage, triggering a storage event notification.

2. The event fires a processing job into a queue (SQS, Cloud Tasks, or a database backed queue).

3. A worker picks up the job, downloads the file from storage, processes it, and uploads the results back to storage.

4. The worker updates the file record in your database with processing results (thumbnail URL, dimensions, duration, text content).

Image processing typically generates multiple variants: a thumbnail (150px), a medium size for feeds (800px), and the original. Convert to WebP for browsers that support it, keeping a JPEG fallback. Store all variants with predictable naming: `{file_id}/original.jpg`, `{file_id}/thumb.webp`, `{file_id}/medium.webp`. This lets your frontend construct URLs without additional API calls.

Video processing is the most resource intensive. Transcode to H.264/AAC in MP4 container for maximum compatibility. Generate multiple quality levels (480p, 720p, 1080p) if you are building a streaming experience. Extract a thumbnail frame at the 2 second mark. Video transcoding can take minutes for long files, so the user experience must clearly communicate that processing is in progress and update when complete.

For the processing infrastructure, serverless functions work well for lightweight jobs like image resizing. For heavy processing like video transcoding, use dedicated compute instances or managed services like AWS MediaConvert. The cost difference is significant: transcoding a 1 hour video on a Lambda function would be 10x more expensive than on a reserved GPU instance.

Storage Architecture

Where and how you store files has long term cost and performance implications.

Hot storage (S3 Standard, GCS Standard) is for files that are actively accessed. Recent uploads, profile photos, active document attachments. Access is fast and pricing is based on storage volume plus request count.

Warm storage (S3 Infrequent Access, GCS Nearline) is for files that are kept but rarely accessed. Old user uploads, archived documents, previous versions. Storage cost is 40 to 60% lower, but retrieval has a per GB fee. Move files to warm storage automatically after 30 to 90 days without access.

Cold storage (S3 Glacier, GCS Coldline) is for compliance and backup. Files you must keep for legal reasons but will almost never retrieve. Storage cost is 70 to 90% lower, but retrieval takes hours and costs significantly more per GB.

Implement lifecycle policies that automatically transition files between tiers based on access patterns. This is a set it and forget it optimization that consistently reduces storage costs by 30 to 50% for applications with growing file archives. We covered broader cost optimization strategies in our cloud cost reduction guide.

Serving Files to Users

Uploading and storing files is half the challenge. Serving them back efficiently is the other half.

CDN distribution is mandatory for any application serving files to users. Put a CDN (CloudFront, Cloudflare, Fastly) in front of your storage bucket. The CDN caches files at edge nodes worldwide, so a user in Singapore gets the file from a Singapore edge node instead of your US East S3 bucket. Latency drops from 300ms to 30ms.

Signed download URLs for private files. Not every file should be publicly accessible. For files that require authorization (private documents, paid content, user specific exports), generate signed download URLs with short TTLs. The client requests a download URL from your API, which validates permissions and returns a signed URL that expires in 15 to 60 minutes.

On the fly image transformations. Services like Imgix, Cloudinary, or Supabase image transformations let you request specific sizes and formats via URL parameters. Instead of pre generating every possible image variant, you store the original and request `/image.jpg?width=400&format=webp` at serve time. The transformation service caches the result, so subsequent requests for the same variant are fast. This reduces storage costs and processing complexity at the expense of slightly higher first request latency.

Security Considerations

File uploads are one of the most common attack vectors in web applications. Your architecture must address several security concerns.

File type validation. Never trust the client's content type header. Validate the actual file content server side by checking magic bytes (the first few bytes of the file that identify the format). A file named "photo.jpg" with content type "image/jpeg" might actually be an executable.

File size limits. Enforce maximum file sizes both in the signed URL constraints and in your API validation. Without limits, an attacker can fill your storage bucket with arbitrary data.

Malware scanning. For applications that accept documents or files from untrusted users, run uploaded files through a malware scanner before making them available. AWS has native integration with malware scanning for S3 uploads, or you can integrate ClamAV as a step in your processing pipeline.

Content isolation. Serve user uploaded content from a different domain than your application. If a user uploads an HTML file and you serve it from your main domain, the browser will execute any JavaScript in that file with access to your cookies and local storage. Use a separate domain (like `uploads.yourapp.com`) or set strict `Content-Disposition: attachment` headers to prevent inline rendering. We covered broader security patterns in our web app security checklist.

Monitoring Upload Health

Track these metrics to maintain a healthy upload system. Upload success rate measures the percentage of initiated uploads that complete successfully. If this drops below 95%, investigate connection issues, timeout configurations, or storage capacity limits. Upload latency by file size shows you whether large file uploads are taking disproportionately long, indicating possible chunking issues. Processing queue depth tells you whether your workers are keeping up with upload volume. A growing queue means you need more processing capacity. Storage growth rate helps you forecast costs and plan lifecycle policy adjustments.

Building a production grade file upload system touches cloud infrastructure, security, performance, and user experience. It is one of those features where the difference between a prototype and a production system is enormous. If you are building a product where file uploads are a core feature, or you are struggling with an upload system that is unreliable at scale, reach out to us to discuss the right architecture for your requirements.

File Upload Architecture: From Browser to Storage

The Naive Approach and Why It Breaks

Signed URL Uploads: The Right Pattern

Chunked and Resumable Uploads

Processing Pipeline

Storage Architecture

Serving Files to Users

Security Considerations

Monitoring Upload Health

Let us talk about your project

Related articles

5 Software Architecture Mistakes That Kill Startups

Real Time Architecture: WebSockets, SSE, and Polling Compared

Shopping Cart Architecture for Custom Ecommerce