[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using RocksDB as State Backend over a Distributed File System


I'm agree with Stefan. I think you can look at this document, given here: Apache Flink 1.4 Documentation:Checkpointing

Qingxiang Ma.

2018-04-26 20:00 GMT+08:00 Stefan Richter <s.richter@xxxxxxxxxxxxxxxxx>:

I think there is a misunderstanding. RocksDB state backend always operates on local disk of the node that runs your task to give you optimal performance. You can think of this as a transient working area that does not require any durability. Durability always happens through checkpoints (or savepoints) which, in turn, go to a distributed storage. Checkpoints and checkpoints are like a consistent moment-in-time image of the backends content and can be used to recover under failure (checkpoints) or manually resume your job (savepoints).


Am 26.04.2018 um 13:16 schrieb Chirag Dewan <chirag.dewan22@xxxxxxxx>:


I am working on a use case where I need to store a large amount of data in state. I am using RocksDB as my state backend. Now to ensure data replication, I want to store the RocksDB files in some distributed file system.

From the documentation I can see that Flink recommends a list of FileSystem to be used for state backend. Given here :

But I cannot figure out the file system for RocksDB. What are the recommendations for File Systems to be used with  RocksDB? 

Thanks in advance.