Thrashing!!

Follow here
Thrashing occurs when a system spends more time swapping data between memory and disk than actually performing useful work. In simple terms, the system becomes busy “managing memory” instead of processing tasks efficiently.

A simple real-life analogy is a small study table with space for only three books while you need ten books to study. You constantly remove one book and bring another from the bookshelf again and again. Instead of studying, most of your time is wasted managing the books. That situation is similar to thrashing.

In operating systems, thrashing usually happens when RAM becomes full and too many processes compete for memory. The operating system uses virtual memory and page replacement techniques, but if the working sets of processes cannot fit into RAM, continuous page faults occur. As a result, pages are constantly swapped between RAM and disk, drastically reducing CPU utilization and slowing down the entire system.

Thrashing can also occur in database systems and caching layers such as Redis. If the cache size is too small for the workload, the system keeps evicting old data to make space for new data, only to immediately need the old data again. For example, if a cache can hold 1000 items but users repeatedly request 2000 items in rotation, the cache continuously reloads and evicts the same entries. This repeated cache miss cycle is called cache thrashing.

Consider an operating system example where four processes are running while RAM can hold only four pages. If each process requires at least two pages to execute properly, the system actually needs eight pages in total. Since only four pages fit in memory, the OS constantly swaps pages between RAM and disk. The CPU then spends most of its time waiting for memory I/O rather than executing instructions, resulting in severe performance degradation.

A similar situation can happen in applications like Instagram using Redis caching. Suppose Redis can store 1000 user profiles while 3000 active users are requesting profiles repeatedly within a short interval. Redis continuously evicts older profiles and reloads new ones, but the evicted profiles are requested again almost immediately. This creates continuous cache misses and unnecessary database fetches, leading to cache thrashing.

Some common symptoms of thrashing include high CPU usage with low actual productivity, reduced throughput, high page-fault rates, heavy disk I/O activity, and low cache hit ratios. Even though the system appears busy, most resources are wasted handling memory operations.

There are several ways to prevent thrashing. In operating systems, reducing the degree of multiprogramming, increasing physical memory, and using smarter page replacement algorithms like LRU or the working-set model can help. In caching systems, increasing cache size, optimizing TTL values, segmenting cache data, and using LFU instead of LRU can significantly reduce cache thrashing.

In summary, thrashing is a condition where a system spends more time swapping or replacing data than performing actual computation. It is typically caused by insufficient memory or cache capacity relative to the workload and results in high latency, poor throughput, and heavy I/O overhead.

DEV Community

Thrashing!!

Top comments (0)