OSDir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Taskmanager JVM crash


No, that problem I mentioned does not affect batch jobs. Must be something different then, but unfortunately the dump looks not very helpful to me because of the „error occurred during error reporting (printing native stack)“.

Am 14.05.2018 um 14:26 schrieb Flavio Pompermaier <pompermaier@xxxxxxxx>:

My job is a batch one, not a streaming job. Is it possible that the cause is the one you mentioned?

On Mon, 14 May 2018, 14:23 Stefan Richter, <s.richter@xxxxxxxxxxxxxxxxx> wrote:
Hi,

that looks like a known issue where Flink did not wait for the shutdown of the timer service before disposing state backends. This is problem fixed in the >= 1.4 branches.

Best,
Stefan 

Am 14.05.2018 um 14:12 schrieb Flavio Pompermaier <pompermaier@xxxxxxxx>:

Hi to all,
I have a Flink 1.3.1 job that runs multiple times.
Everything goes well for some time (e.g. 10 jobs). Then, one or more TMs suddently die.

In the .out file I find something like this:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f6f3897712f, pid=18794, tid=140110535448320
#
# JRE version: Java(TM) SE Runtime Environment (8.0_72-b15) (build 1.8.0_72-b15)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.72-b15 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libc.so.6+0x7f12f]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/user/hs_err_pid18794.log
#
# If you would like to submit a bug report, please visit:
#


Attached the produced error report. Do you find anything useful?
I can even send you the job's jar with the data but it requires about 200 MB..

Best,
Flavio
<hs_err_pid18794.log>