[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Taskmanager JVM crash


that looks like a known issue where Flink did not wait for the shutdown of the timer service before disposing state backends. This is problem fixed in the >= 1.4 branches.


Am 14.05.2018 um 14:12 schrieb Flavio Pompermaier <pompermaier@xxxxxxxx>:

Hi to all,
I have a Flink 1.3.1 job that runs multiple times.
Everything goes well for some time (e.g. 10 jobs). Then, one or more TMs suddently die.

In the .out file I find something like this:

# A fatal error has been detected by the Java Runtime Environment:
#  SIGSEGV (0xb) at pc=0x00007f6f3897712f, pid=18794, tid=140110535448320
# JRE version: Java(TM) SE Runtime Environment (8.0_72-b15) (build 1.8.0_72-b15)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.72-b15 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libc.so.6+0x7f12f]
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
# An error report file with more information is saved as:
# /home/user/hs_err_pid18794.log
# If you would like to submit a bug report, please visit:

Attached the produced error report. Do you find anything useful?
I can even send you the job's jar with the data but it requires about 200 MB..