First off, I am new to using HDFS to store things, so expect stupid questions.
I am working on hardening our Flink cluster for production usage. This includes setting up an HA flink cluster, saving checkpoint and savepoints to a central location etc. I have a functioning HDFS setup inside an HA Kubernetes cluster. We have successfully stored checkpoint data in the HDFS directory.
When we specify the location for the HDFS savepoints/checkpoints/HA save locations we specify the a single namenode in the url. My question is how do we implement failover in the event that namenode fails? We looked at putting the namenodes behind a load balancer, except the backup nodes attempt to respond to writes (and fail). I figure I am missing something simple.