ERROR org.apache.flink.shaded.org.apache.curator.ConnectionState - Authentication failedWhat happen to Flink checkpoint and state if zookeeper cluster is crashed ?
Hi Miki,it looks as if you did not submit a job to the cluster of which you shared the logs. At least I could not see a submit job call.Cheers,TillOn Mon, Jun 4, 2018 at 12:31 PM miki haiat <miko5054@xxxxxxxxx> wrote:HI Till,
Iv`e managed to do reproduce it.Full log faild_jm.logOn Mon, Jun 4, 2018 at 10:33 AM Till Rohrmann <trohrmann@xxxxxxxxxx> wrote:Hmmm, Flink should not delete the stored blobs on the HA storage. Could you try to reproduce the problem and then send us the logs on DEBUG level? Please also check before shutting the cluster down, that the files were there.Cheers,TillOn Sun, Jun 3, 2018 at 1:10 PM miki haiat <miko5054@xxxxxxxxx> wrote:Hi Till ,
- the files are not longer exist in HDFS.
- yes , stop and start the cluster from the bin commands.
- unfortunately i deleted the log.. :(I wondered if this code could cause this issue , the way in using checkpointStateBackend sb = new FsStateBackend("hdfs://***/flink/my_city/checkpoints");
env.getCheckpointConfig().setCheckpointInterval(60000);On Fri, Jun 1, 2018 at 6:19 PM Till Rohrmann <trohrmann@xxxxxxxxxx> wrote:Hi Miki,could you check whether the files are really no longer stored on HDFS? How did you terminate the cluster? Simply calling `bin/stop-cluster.sh`? I just tried it locally and it could recover the job after calling `bin/start-cluster.sh` again.What would be helpful are the logs from the initial run of the job. So if you can reproduce the problem, then this log would be very helpful.Cheers,TillOn Thu, May 31, 2018 at 6:14 PM, miki haiat <miko5054@xxxxxxxxx> wrote:Hi,Im having some wierd issue with the JM recovery ,
I using HDFS and ZOOKEEPER for HA stand alone cluster .Iv stop the cluster change some parameters in the flink conf (Memory).But now when i start the cluster again im having an error that preventing from JM to start.somehow the checkpoint file doesn't exists in HDOOP and JM wont start .full log JM log file2018-05-31 11:57:05,568 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error occurred in the cluster entrypoint.
Caused by: java.lang.Exception: Cannot set up the user code libraries: File does not exist: /flink1.5/ha/default/blob/job_5c545fc3f43d69325fb9966b8dd4c8f3/blob_p-5d9f3be555d3b05f90b5e148235d25730eb65b3d-ae486e221962f7b96e36da18fe1c57ca at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)