DEV Community

Programmers Quickie

💥 Spark DataFrame Cache

In Apache Spark, you can cache a DataFrame in memory using the cache() or persist() method. The cache() method is a shorthand for persist() with the default storage level of MEMORY_ONLY. You can use the persist() method to specify a different storage level if desired, such as MEMORY_AND_DISK or MEMORY_ONLY_SER.

Episode source