Caching is the single highest leverage performance optimization available to most applications. A well implemented caching strategy can reduce response times by 90%, cut database load by 80%, and slash infrastructure costs dramatically. A poorly implemented one causes stale data bugs that erode user trust and create debugging nightmares that consume entire engineering weeks.
We have implemented caching layers for applications at every scale, from early stage products running on a single server to distributed systems handling hundreds of millions of requests per day. The principles are consistent even when the tools change. This post covers how to think about caching at each layer, when to apply each strategy, and how to avoid the pitfalls that turn caching from a performance win into a reliability problem.
The Caching Layers
Modern applications have four distinct caching layers, each serving a different purpose and operating at a different timescale.
Browser cache is the closest to the user. Static assets like JavaScript bundles, CSS files, images, and fonts are cached in the browser based on HTTP cache headers. When configured correctly, a returning user loads your app almost entirely from their local cache, making no network requests for assets that have not changed. This is essentially free performance.
CDN cache sits at the edge of the network, geographically close to users. CDNs like Cloudflare, Fastly, and CloudFront cache responses at edge nodes around the world. A user in Tokyo hits the Tokyo edge node instead of your origin server in Virginia, cutting latency from 200ms to 20ms. CDNs work best for static content and semi dynamic content that can tolerate short staleness windows.
Application cache (typically Redis or Memcached) lives in your infrastructure, between your application servers and your database. This is where you cache database query results, API responses from third party services, computed values, and session data. Application caching reduces database load and speeds up response times for dynamic content.
Database cache is the built in query cache and buffer pool in your database engine. Postgres and MySQL both maintain in memory caches of frequently accessed data and query plans. While you do not configure this layer as directly as the others, understanding it helps you write queries that benefit from it.
Redis: The Application Cache Workhorse
Redis is the de facto standard for application caching, and for good reason. It is fast (sub millisecond reads), versatile (strings, hashes, lists, sets, sorted sets), and reliable when operated correctly. Here is how we use it in practice.
Cache database query results. The most common pattern. Your API endpoint fetches user profile data, which requires joining three tables. Instead of running that join on every request, cache the result in Redis with a key like `user:profile:{user_id}` and a TTL of 5 minutes. Subsequent requests read from Redis in under 1ms instead of hitting the database for 15ms.
Cache computed values. Dashboard metrics, report aggregations, leaderboard rankings, anything that requires heavy computation should be cached. Compute once, serve from cache until the TTL expires or the underlying data changes.
Rate limiting and counters. Redis INCR with TTL is the standard approach for rate limiting API endpoints. Increment a counter per user per time window and reject requests that exceed the limit. This is fast enough to run on every single request without measurable overhead.
Session storage. If your application uses server side sessions, Redis is the right backend. It is faster than database session storage and supports TTL natively so expired sessions clean themselves up.
The critical decision with Redis caching is your invalidation strategy. TTL based invalidation is the simplest. Set a TTL of 300 seconds and accept that data might be up to 5 minutes stale. This works for most dashboard data, profile information, and non critical content.
For data that must be fresh immediately after changes, use active invalidation. When a user updates their profile, your update handler explicitly deletes the `user:profile:{user_id}` cache key. The next read misses the cache, fetches fresh data from the database, and re populates the cache. This is more complex but eliminates staleness for data where freshness matters.
The pattern we use most often is a cache aside strategy: read from cache first, on cache miss read from database and write to cache, and actively invalidate on writes. This gives you the best balance of performance and freshness. We covered the database query patterns that complement this approach in our database schema design guide.
CDN Caching: Beyond Static Assets
Most teams configure their CDN to cache static assets and call it done. That is leaving massive performance gains on the table. CDNs can cache dynamic content too, and the results are transformative.
API response caching. If your API serves data that does not change per user, like a product catalog, a list of categories, or public content, cache those responses at the CDN edge. Set a `Cache-Control: public, max-age=60` header and the CDN serves the response from the nearest edge node. Your origin server handles the request once per minute instead of thousands of times per minute.
Stale while revalidate. This is the most powerful CDN caching directive for dynamic content. The header `Cache-Control: public, max-age=60, stale-while-revalidate=300` tells the CDN: serve the cached version for up to 60 seconds, and for the next 300 seconds, serve the stale version while fetching a fresh one in the background. Users always get an instant response, and the data is never more than 6 minutes stale.
Cache key design. The CDN caches based on the full URL by default. If your API returns different data based on query parameters, those parameters become part of the cache key automatically. But be careful with parameters that do not affect the response. A tracking parameter like `?utm_source=google` should be stripped before caching, otherwise you get duplicate cached entries for the same content.
Cache purging. When content changes and you cannot wait for TTL expiry, most CDNs support purge APIs. Update a product in your admin panel, fire a purge request for that product's URL, and the CDN drops the cached version. The next request fetches fresh data from origin.
For teams evaluating Vercel versus AWS for hosting, edge caching behavior is a significant differentiator. Vercel's edge network integrates tightly with Next.js ISR (Incremental Static Regeneration), while CloudFront requires more manual configuration but offers finer control.
Edge Computing: Caching Meets Logic
Edge computing goes a step beyond CDN caching by letting you run code at the edge, not just cache static responses. Cloudflare Workers, Vercel Edge Functions, and AWS CloudFront Functions execute JavaScript at edge nodes, enabling patterns that neither pure CDN caching nor application caching can achieve.
Personalized edge responses. A product listing page might be 95% identical for every user but show a different "recommended for you" section. Instead of serving a fully dynamic page from origin or a fully cached page from CDN, an edge function can serve the cached page shell and inject the personalized section based on data stored in a lightweight edge key value store.
Geo specific responses. An edge function can read the request's geographic information and serve region appropriate content, currency, language, or compliance notices, without routing to origin. The decision logic runs in 1 to 2ms at the edge instead of adding a 100 to 200ms round trip to your origin server.
A/B testing at the edge. Route users to different cached variants of a page based on a cookie or random assignment, entirely at the edge. This eliminates the performance cost of server side A/B testing while maintaining consistent user experiences.
The caveat with edge computing is that you are running in a constrained environment. Edge functions have limited CPU time, limited memory, and restricted APIs. They are excellent for request routing, light transformations, and cache orchestration. They are not suitable for heavy computation or complex database queries.
Common Caching Pitfalls
Caching bugs are among the most frustrating to debug because the system works correctly most of the time. Here are the pitfalls we see most often.
Cache stampede. A popular cache key expires and 1,000 concurrent requests all miss the cache simultaneously, all hitting the database at once. The solution is lock based recomputation: the first request acquires a lock and recomputes the value while subsequent requests wait or receive the stale value. Redis supports this pattern with `SET NX` for lock acquisition.
Unbounded cache growth. Without memory limits or eviction policies, your Redis instance fills up and either crashes or starts evicting unpredictably. Always configure a max memory limit and an eviction policy. We recommend `allkeys-lru` (least recently used) for general caching workloads.
Inconsistent invalidation. Your user update endpoint invalidates the profile cache, but the settings update endpoint does not, because a different developer wrote it. The result is intermittent stale data that depends on which field was changed. The fix is centralizing cache invalidation logic in your data access layer rather than scattering it across endpoints.
Caching errors. If your database query fails and you cache the error response, every subsequent request gets the error from cache until the TTL expires. Never cache error states. Always check for successful results before writing to cache.
Over caching. Not everything should be cached. Data that changes on every request, data that is unique per request context, and data that must be strongly consistent (like account balances during transactions) should bypass the cache entirely. Over caching creates more staleness bugs than it solves performance problems.
Putting It All Together
A well designed caching strategy layers these approaches. Browser cache handles static assets with long TTLs and content hashing for cache busting. CDN cache serves static pages, public API responses, and semi dynamic content with stale while revalidate. Redis caches database query results, computed values, and session data with active invalidation for write sensitive data. Database cache handles the rest through proper indexing and query optimization.
The result is a system where the vast majority of requests never touch your application servers, let alone your database. We have seen this approach reduce origin traffic by 85% and cut average response times from 400ms to under 50ms. That kind of performance improvement translates directly into better user experience, higher conversion rates, and lower infrastructure costs, a theme we explored in depth in our cloud cost reduction guide.
If your application is struggling with performance, or if you are building something new and want to get caching right from the start, reach out to us. We will help you design a caching strategy that is effective without being fragile.