Cache Invalidation Strategies: The Second Hardest Problem in Computer Science - AlexWebLab in Bangkok, Thailand now, before in Hong Kong 香港

There is a famous quote attributed to Phil Karlton: "There are only two hard things in Computer Science: cache invalidation and naming things." The joke lands because cache invalidation is genuinely hard — not technically, but architecturally. The code to invalidate a cache entry is trivial. Knowing when to do it, and being confident all copies are invalidated, is where complexity lives.

What Is Cache Invalidation?

Invalidation is the process of marking a cached value as stale — either deleting it, replacing it, or flagging it to be revalidated before the next use. The difficulty is that data can be cached in multiple places simultaneously:

HTTP caches (browser, proxy, CDN)
Service Worker caches
In-memory stores (React state, TanStack Query)
localStorage or IndexedDB
Server-side caches (Redis, Memcached)

An update at the source does not automatically propagate to any of these layers.

Strategy 1: TTL (Time to Live)

The simplest strategy: every cached entry has an expiration time. After that time, the entry is considered stale and will be refreshed on next access.

{
  data: { username: 'alex' },
  cachedAt: 1714000000000,
  ttl: 60 * 60 * 1000 // 1 hour
}

Pros: Simple to implement, no external coordination needed.
Cons: Stale data is served for the full TTL period after it changes. Setting TTL too low eliminates the caching benefit; too high means stale data.

In HTTP caching: Cache-Control: max-age=3600 sets a 1-hour TTL. After it expires, the browser re-requests the resource.

Strategy 2: Stale-While-Revalidate

Serve the stale cached data immediately while fetching an updated version in the background. The user gets instant response; the cache is updated for the next request.

// HTTP header
Cache-Control: max-age=60, stale-while-revalidate=600

This means: serve from cache for 60 seconds without checking; serve stale for up to 600 seconds while revalidating in the background. TanStack Query uses this pattern as its default behavior: stale data is shown immediately while a background refetch runs silently.

Pros: Fast user experience, data converges on freshness over time.
Cons: Users see stale data at least once after the cache becomes stale. Not suitable for data that must be exact.

Strategy 3: Cache-Aside (Lazy Loading)

The application code manages the cache explicitly. On a read:

Check cache — if hit, return cached value
If miss, fetch from source, store in cache, return value

On a write: update the source and delete the cache entry (not update it). The next read will repopulate it.

async function getUser(userId) {
  const cached = await cache.get(`user:${userId}`)
  if (cached) return cached

  const user = await db.users.findById(userId)
  await cache.set(`user:${userId}`, user, { ttl: 300 })
  return user
}

async function updateUser(userId, data) {
  await db.users.update(userId, data)
  await cache.del(`user:${userId}`) // invalidate, don't update
}

Deleting is safer than updating because it avoids a race condition where the old value is written to the cache after the source update.

Strategy 4: Write-Through

On every write to the source, immediately update the cache as well. The cache is always written in sync with the source.

async function updateUser(userId, data) {
  await db.users.update(userId, data)
  await cache.set(`user:${userId}`, data) // always keep in sync
}

Pros: Cache is always fresh after a write; no stale reads.
Cons: Every write hits both the source and the cache, increasing write latency. Adds cache entries for data that may never be read again.

Strategy 5: Event-Driven Invalidation

Instead of time-based expiry, invalidate cache entries in response to events. Common in distributed systems with a message bus:

User updated → publish UserUpdated event → cache subscriber invalidates user:{id}

In the frontend, this is achieved via WebSockets or Server-Sent Events that push invalidation signals from the server:

const ws = new WebSocket('wss://api.example.com/events')
ws.onmessage = (e) => {
  const { type, id } = JSON.parse(e.data)
  if (type === 'USER_UPDATED') {
    queryClient.invalidateQueries({ queryKey: ['user', id] })
  }
}

Pros: Cache is invalidated precisely when data changes, not after an arbitrary delay.
Cons: Requires real-time infrastructure. If the event is missed (network blip), the cache is never invalidated.

Cache Keys and Versioning

A common HTTP pattern for static assets is content-addressable caching: the file hash is embedded in the URL.

<script src="/app.3f4a1b.js"></script>

Cache-Control: max-age=31536000, immutable can be set safely — the URL will never serve different content. When the file changes, a new URL with a new hash is generated. Old browsers keep the old cached version (which is still valid for that URL).

The Frontend Cache Hierarchy

When thinking about frontend caching, the layers matter:

Memory cache (browser): in-process, fastest, per-tab
Service Worker cache: programmable, survives tab close, per-origin
HTTP cache (disk): managed by browser, per-origin
CDN cache: shared across users, globally distributed
Server cache (Redis): shared across servers

Invalidation at layer 4 (CDN) does not automatically invalidate layers 1–3. A common mistake is purging the CDN but not knowing that users still have the old response in their browser HTTP cache.

Conclusion

Cache invalidation is hard because caches are everywhere and decoupled from the data they represent. The right strategy depends on how stale the data can be, how frequently it changes, and how the architecture is structured. TTL is the floor — always useful. Stale-while-revalidate balances speed and freshness for most UI data. Event-driven invalidation delivers precision but requires real-time infrastructure. Most production systems use several of these in combination.