mapreduce - How can I persist cached tables in memory after the program ends (Apache Spark)? -
i'm new apache spark , have simple question dataframe caching.
when cached dataframe in memory using df.cache() in python, found data removed after program terminates.
can keep cached data in memory can access data next run without doing df.cache() again?
the cache used cache() tied current spark context; purpose prevent having recalculate intermediate results in current application multiple times. if context gets closed, cache gone. nor can share cache between different running spark contexts.
to able reuse data in different context, have save file system. if prefer results in memory (or have chance of being in memory when try reload them) can @ using tachyon.
Comments
Post a Comment