[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug 63010] New: mod_proxy_hcheck when health checks configured, esp. vs down service, consumes large memory


            Bug ID: 63010
           Summary: mod_proxy_hcheck when health checks configured, esp.
                    vs down service, consumes large memory
           Product: Apache httpd-2
           Version: 2.4.37
          Hardware: PC
            Status: NEW
          Severity: major
          Priority: P2
         Component: mod_proxy_hcheck
          Assignee: bugs@xxxxxxxxxxxxxxxx
          Reporter: carltonf@xxxxxxxxx
  Target Milestone: ---

Defect related to mod_proxy_hcheck

Found/repro's in Apache httpd version 2.4.33, 2.4.35, and 2.4.37
(mod_proxy_hcheck module) on Windows.

We added the hcheck module and then noticed that httpd started consuming large
amounts of memory. In previous versions before configuring the module, httpd
would be stable at about 35MB of ram. With this module, the memory would grow
to 3GB and stabilize. One repro actually had it grow to 17GB and the machine
ran out of pagefile space and crashed. We noticed that in the default
configuration we were checking a service that was not running. After
configuring this service to run, the memory used dropped significantly. Instead
of 3GB, it grew to 150MB and stabilized.

Even with the service running, there is a period of time shortly after startup
during which the healthcheck runs hundreds of times for each service each
second. During this time the memory consumption steadily grows. It does this
for about 5 seconds, generating thousands of lines of (trace) level logging and
running the health check over a thousand times for each service. Then for some
reason it stabilizes, and checks are seen at a more normal 30 second interval.
But the memory is not released and the process still has an elevated amount of
memory. This might be due to the healthcheck module just requiring more memory
than running without it. But it is concerning as it seems like increased memory
consumption is related to checking a service which is down. If a service was
down, you have one problem, but if the healthcheck itself starts consuming tons
of resources on top of that, you compound the problem.

278 times in one second:
Condition ok1234 for 6af324730 (http://localhost:8843): passed

This sequence repeated (this is logging from one single thread checking one
endpoint) (please see attached file for a full one-second interval of logging,
15k+ lines):
[Thu Dec 13 17:58:40.618694 2018] [proxy_hcheck:debug] [pid 21288:tid 748]
mod_proxy_hcheck.c(829): AH03256: Threaded Health checking
[Thu Dec 13 17:58:40.618694 2018] [proxy:debug] [pid 21288:tid 748]
proxy_util.c(2313): AH00942: HCOH: has acquired connection for (localhost)
[Thu Dec 13 17:58:40.619694 2018] [proxy:trace2] [pid 21288:tid 748]
proxy_util.c(3010): HCOH: fam 2 socket created to connect to localhost
[Thu Dec 13 17:58:40.619694 2018] [proxy:debug] [pid 21288:tid 748]
proxy_util.c(3042): AH02824: HCOH: connection established with
[Thu Dec 13 17:58:40.620696 2018] [proxy:debug] [pid 21288:tid 748]
proxy_util.c(3212): AH00962: HCOH: connection complete to
[Thu Dec 13 17:58:40.649654 2018] [proxy_hcheck:debug] [pid 21288:tid 748]
mod_proxy_hcheck.c(644): AH03254: HTTP/1.1 200 OK
[Thu Dec 13 17:58:40.649654 2018] [proxy_hcheck:trace2] [pid 21288:tid 748]
mod_proxy_hcheck.c(797): Condition ok1234 for 6af324730
(http://localhost:8843): passed
[Thu Dec 13 17:58:40.649654 2018] [proxy:debug] [pid 21288:tid 748]
proxy_util.c(2328): AH00943: HCOH: has released connection for (localhost)
[Thu Dec 13 17:58:40.649654 2018] [proxy_hcheck:debug] [pid 21288:tid 748]
mod_proxy_hcheck.c(573): AH03251: Health check GET Status (0) for 6af324730.

excerpts from httpd.conf

# This defines a template for use by proxy_hcheck_module.
# Some other specific parameters are set directly on the BalancerMember lines.
ProxyHCExpr ok1234 {%{REQUEST_STATUS} =~ /^[1234]/}
ProxyHCTemplate hcsimple hcmethod=GET hcexpr=ok1234 hcuri=/favicon.ico
<Proxy balancer://samlservice-cluster>
BalancerMember http://localhost:8843 route=1 hcinterval=30 hcfails=2 hcpasses=1
LogLevel debug
LogLevel proxy_hcheck:TRACE2
LogLevel proxy:TRACE2
LogLevel watchdog:TRACE2

Have fix for this issue in place

This issue has similarities - note comments about consuming 20GB heap space:

Also noticed this issue on Reddit

You are receiving this mail because:
You are the assignee for the bug.
To unsubscribe, e-mail: bugs-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: bugs-help@xxxxxxxxxxxxxxxx