On Fri, Aug 24, 2018 at 11:57 AM Christophe JAILLET<christophe.jaillet@xxxxxxxxxx> wrote:
Bug describes something else IIUC. Because the watchdog calls us 10times per second, it continuously sees that the worker hasn't beenhealth checked within the desired interval and queues up a check, itdoesn't know one is queued.
Le 24/08/2018 à 16:40, Jim Jagielski a écrit :
I was wondering if someone wanted to provide a sanity checkHi Jim,
on the above PR and what's "expected" by the health check code.
It would be very easy to adjust so that hcinterval was not
the time between successive checks but the interval between
the end of one and the start of another, but I'm not sure that
is as useful. In other words, I think the current behavior
is right (but think the docs need to be updated), but am
willing to have my mind changed :)
the current behavior is also what I would expect.
If I configure a check every 10s, I would expect 6 checks each minute,
even if the test itself takes time to perform.
But that is only an issue, afaict, if the time taken to do the health check is
greater than the interval chosen... Or am I misunderstanding? That is,
if the interval is 200ms, and the health check takes 100ms, all is fine, we
get 5 checks a second.
I guess what we could do is emit a warning if when a check is queued, we
already have one queued, or in process. This would some info to the sysadmin.
We could also track the time taken to perform a check and have that available
via mod_status as well. But these all assume that the underlying logic, and
how it's implemented, is sane.