OSDir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Taskmanager JVM crash


My job is a batch one, not a streaming job. Is it possible that the cause is the one you mentioned?

On Mon, 14 May 2018, 14:23 Stefan Richter, <s.richter@xxxxxxxxxxxxxxxxx> wrote:
Hi,

that looks like a known issue where Flink did not wait for the shutdown of the timer service before disposing state backends. This is problem fixed in the >= 1.4 branches.

Best,
Stefan 

Am 14.05.2018 um 14:12 schrieb Flavio Pompermaier <pompermaier@xxxxxxxx>:

Hi to all,
I have a Flink 1.3.1 job that runs multiple times.
Everything goes well for some time (e.g. 10 jobs). Then, one or more TMs suddently die.

In the .out file I find something like this:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f6f3897712f, pid=18794, tid=140110535448320
#
# JRE version: Java(TM) SE Runtime Environment (8.0_72-b15) (build 1.8.0_72-b15)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.72-b15 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libc.so.6+0x7f12f]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/user/hs_err_pid18794.log
#
# If you would like to submit a bug report, please visit:
#


Attached the produced error report. Do you find anything useful?
I can even send you the job's jar with the data but it requires about 200 MB..

Best,
Flavio
<hs_err_pid18794.log>