We have big image data(about 20 MB each) coming in at high frequency/volume from a video stream from many cameras.
design thought is to store this data in the 1st step of the Flink Dataflow in EFS(NAS) and access the EFS data from the 3rd step in the dataflow(may be in a totally diffferent TaskManager node) without using RocksDbStateBackend (aka slow Hadoop version1 pattern which Spark solved with in-memory computation).
1. Can we use RocksDbStateBackend configured with file:///efsendpoint/checkpoints to store this image data in EFS and access it from the 3rd step ?
2. Does the checkpointing interval need to be < than the time it takes to get to Step 3 after storing data in EFS in step 1 ? Will this allow Step3 across a different TaskManager node to get to the data stored in EFS via RockDBStateBackend assuming Local Task storage is set ?
3. Can I use the Metrics tab of the Flink dashboard to see how long each step in the dataflow pipeline/graph takes ?