[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Flink 1.7 jobmanager tries to lookup taskmanager by its hostname in k8s environment

When I to deploy Flink 1.7 job to Kubernetes, the job itself runs, but upon visiting Flink UI I can see no metrics and there are WARN messages in jobmanager's log:

[flink-metrics-14] WARN akka.remote.ReliableDeliverySupervisor flink-metrics-akka.remote.default-remote-dispatcher-3 - Association with remote system [akka.tcp://flink-metrics@adhoc-historical-taskmanager-d4b65dfd4-h5nrx:44491] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink-metrics@adhoc-historical-taskmanager-d4b65dfd4-h5nrx:44491]] Caused by: [adhoc-historical-taskmanager-d4b65dfd4-h5nrx: Name or service not known]

Note: adhoc-historical-taskmanager-d4b65dfd4-h5nrx is a hostname of a pod on which taskmanager is running.

So, jobmanager tries to resolve taskmanager's hostname (which probably got to it from taskmanager itself) on a random port. How can this be mitigated?