[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[jira] [Created] (FLINK-10866) Queryable state can prevent cluster from starting

Till Rohrmann created FLINK-10866:

             Summary: Queryable state can prevent cluster from starting
                 Key: FLINK-10866
                 URL: https://issues.apache.org/jira/browse/FLINK-10866
             Project: Flink
          Issue Type: Improvement
          Components: Local Runtime
    Affects Versions: 1.6.2, 1.5.5, 1.7.0
            Reporter: Till Rohrmann

The {{KvStateServerImpl}} can currently prevent the {{TaskExecutor}} from starting. 

Currently, the QS server starts per default on port {{9067}}. If this port is not free, then it fails and stops the whole initialization of the {{TaskExecutor}}. I think the QS server should not stop the {{TaskExecutor}} from starting.

We should at least change the default port to {{0}} to avoid port conflicts. However, this will break all setups which don't explicitly set the QS port because now it either needs to be setup or extracted from the logs.

Additionally, we should think about whether a QS server startup failure should lead to a {{TaskExecutor}} failure or simply be logged. Both approaches have pros and cons. Currently, a failing QS server will also affect users which don't want to use QS. If we tolerate failures in the QS server, then a user who wants to use QS might run into problems with state not being reachable.

This message was sent by Atlassian JIRA