osdir.com
mailing list archive F.A.Q. -since 2001!



Subject: Re: Adding host & services while Nagios daemon
is running - msg#00125

List: network.nagios.devel

Mail Archive Navigation:
by Date: Prev Next Date Index by Thread: Prev Next Thread Index

Hi,

FWIW:

To be honest with you, the only ways that I know to add objects or events
into Nagios 2.x dynamically (without modifying the core,) would be:

1. Writing to the Command Pipe or,
2. Writing an NEB module.

As far as adding new hosts or services, I don't think that that can be
done via the Command Pipe, so I'm guessing that a NEB module might be the
only way. (This only applies to Nagios 2.x - not sure about 3.x)

The reason is that in order to add host and services to the in-core data
structures, you have to be in the memory space of the core. And the way
to do this without modifyig the core (that I know of,) is to write a NEB
module.

One of the ways that I've done this is to write a NEB module that creates
a new thread at initialization time, and then this thread can read object
create/delete command from it's own pipe. This would allow you to
add/delete hosts and/or services on the fly without necessarily degrading
Nagios performance.

I used a similar technique for creating clusters of "check nodes" -
dnx.sf.net.

Hope this helps.

Bob

> Hi guys,
> I?m trying to use Nagios to monitor a Grid environment. As you know, a
> Grid is highly dynamic, with new hosts coming in and out of the scope
that I need to monitor.
> So, what I would need is to add new hosts (and maybe new services) to be
> monitored while Nagios is running.
> Taking a look into NEB documentation, I've seen that disregarding this
> is a read-only (or subscribe only in a pub-sub terminology) you can use
Nagios core engine addservice() function call within a callback and
before giving control back to Nagios.
> I was wondering if it's possible (without modifying current Nagios
> engine implementation) to define new hosts to be monitored as well as
new services on those hosts without restarting Nagios daemon. Is this
possible in any other fashion that is not thru the NEB callbacks?
> Best regards,
>
> Sebastian Ganame
>
>
>
> ---------------------------------
> Preguntá. Respondé. Descubrí.
> Todo lo que querías saber, y lo que ni imaginabas,
> está en Yahoo! Respuestas (Beta).
> Probalo ya!
> -------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/_______________________________________________
Nagios-devel mailing list
> Nagios-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>





-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/


Thread at a glance:

Previous Message by Date:

Re: How does nagios handle plugin exit not in [0, 1, 2, 3]?

In message <4628F199.8000502@xxxxxxxxxxxxxxx>, Hendrik Baecker writes: >John Rouillard schrieb: >> How does nagios handle plugins that don't exit with an errorcode of 0, >> 1, 2, or 3? If the plugin exits with say 127, is the host check logic >> triggered? > >Its triggered by every service check that does not return with return >code 0. That's pretty much what I figured. >> The reason I ask is that I just fixed ~10 services (that were run >> every 3 minutes) that were failing (java was missing from the system), >> and the average latency went from 2 minutes to less than 6 seconds >> (usually in the .6 range), with the max at 25 seconds. This seems a >> huge difference given the fix. >> >> I could almost buy it if each failure triggered a halt to polling and >> forced it to do host checks, but even then the magnitude of the change >> is a bit unbelievable. > >Host checks goes to a high prio scheduling queue and are checked before >other service checks to determine if nagios should write x service alert >or just a single host alert. >Cause of the high prio host checks your service checks may go into latency. I agree there should be an increase in latency, but 24x the latency for 10 services out of 2200+ (on 130 or so hosts) is what is weird. The host check would return almost immediately since the host was up, so there wasn't a big delay there. Hmm, now that starts me thinking, but I think I am walking down the wrong path. The host check can occur in parallel with the outstanding service checks right? So if I have 12 outstanding checks, one of which fails, nagios doesn't wait for those 12 outstanding checks to finish (which could take up to a minute) before it does the host check, finds out the host is fine and starts the next cycle of checks? When I first started I had fewer service checks (1900 or so) and the latency was larger, around 10-15 seconds, but not in the 2 minute range. Then I synced my test install with the current production nagios install and ran the 2200 checks. Then the latency jumped through the roof to 2 minutes which is 66% of the median polling interval. Maybe it's an artifact of the scheduling process and how the service check interleaving occurs. I can't see nagios3's host polling changes making a difference though because in my scenario, it only took one fast ping to verify that the host was up, and all the nagios3 polling changes do is to run a number of host checks in parallel, so the delay would be the same. >> If the failure (w/ exit code 127) would trigger host checking, should >> the logic change to do host checking only when the plugin exits with a >> status in the [0-3] range since it is an invalid exit code? > >Until now the nagios law is: A possible failure is a non-OK State. >The exit codes are under control of each plugin. Well yes, but the only valid exit codes for a plugin that have any meaning to nagios are 0, 1, 2 and 3. Any plugin that returns a value outside that range is broken. >As long as each plugin exits in the defined return code range all is ok. >Why do you think there should be an exception for exit code 127? Because 127 is well outside of the "defined return code range", and I propose the host check logic be disabled not just for exit code 127 but for any exit code > 3. However I am not wedded to the idea. However, I think having my 10 processes fail with exit code 2 would also throw the latency through the roof, which is worrisome. It seems the average latency should be some predictable function of: total number of services number of services in non-ok state This problem/example is making me wonder how predictable that function is. -- rouilj John Rouillard =========================================================================== My employers don't acknowledge my existence much less my opinions. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/

Next Message by Date:

Re: How does nagios handle plugin exit not in [0, 1, 2, 3]?

John Rouillard schrieb: > In message <4628F199.8000502@xxxxxxxxxxxxxxx>, Hendrik Baecker writes: > > I agree there should be an increase in latency, but 24x the latency > for 10 services out of 2200+ (on 130 or so hosts) is what is weird. > The host check would return almost immediately since the host was up, > so there wasn't a big delay there. > > Hmm, now that starts me thinking, but I think I am walking down the > wrong path. The host check can occur in parallel with the outstanding > service checks right? So if I have 12 outstanding checks, one of which > fails, nagios doesn't wait for those 12 outstanding checks to finish > (which could take up to a minute) before it does the host check, finds > out the host is fine and starts the next cycle of checks? > Are we talking about Nagios 2.x or 3.x? In Nagios 2.x your 12 outstanding checks where scheduled for their normal time. If the check 1 of 12 returns a non-OK State the other 11 scheduled checks were set to "hold" cause nagios has to immediate execute a host check for the first. AFAIK nagios doesn't care on the rest of eleven checks until the host checks returns into a HARD State (reaching the max_check_attempt). A few math: Hostcheck command based on the plugin check_ping with a host check timeout of 5 seconds and max_attempts on 4. Host has no parent! In that case your rest of 11 service checks where hold on up to 20 seconds if the host is realy down, cause check_ping takes up the time until timeout for a non reachable host (check_icmp in that case is much faster). In my opinion nagios is not doing anything else then waiting for the 5 second timeout for the max_check_attempt amount of times. If you are using just a single parent host, the time for checking a single host will be doubled for checking the parent too. > When I first started I had fewer service checks (1900 or so) and the > latency was larger, around 10-15 seconds, but not in the 2 minute > range. Then I synced my test install with the current production > nagios install and ran the 2200 checks. Then the latency jumped > through the roof to 2 minutes which is 66% of the median polling > interval. > Yes. There seems to be a magic borderline around 2000 of service checks in Nagios 2.x. > Maybe it's an artifact of the scheduling process and how the service > check interleaving occurs. I can't see nagios3's host polling changes > making a difference though because in my scenario, it only took one fast > ping to verify that the host was up, and all the nagios3 polling > changes do is to run a number of host checks in parallel, so the delay > would be the same. > Did you tested this? Up to now I haven't got the chance to test the new logic in a real manner. But the difference of hande host checks, informing host parents and childs should be accelerate the hole stuff I think. Hendrik ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/

Previous Message by Thread:

Re: Adding host & services while Nagios daemon is running

On Apr 20, 2007, at 11:50 AM, Sebastian Ganame wrote:So, what I would need is to add new hosts (and maybe new services) to be monitored while Nagios is running.What about using the HUP signal to nagios?Modify the configuration, check for errors using nagios -s nagios.cfg.then kill -HUP `cat /your/nagios.pid`That will reload the configuration, adding or removing services or hosts to the current nagios instance. -- Sergio GuzmánSan José, Costa RicaTel: (506) 258-5757http://www.gridshield.net/Gridshield: Protección y Monitoreo de Redes ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/_______________________________________________ Nagios-devel mailing list Nagios-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.sourceforge.net/lists/listinfo/nagios-devel

Next Message by Thread:

CALL FOR PAPERS : The monitoring day at the rmll 2007 Amiens France

Hello, This Year at the rmll 2007 there is a Monitoring day the 12 july 200 For the people who don't know it, the RMLL are the "LIBRE SOFTWARE MEETING", they are organized each year in a different french city. For 4 days people share knowledge, discuss with other people, look for technical achievement, listen to speakers in various topics in a cool and friendly way. Those 4 days are free to attend for everybody. the website is at http://www.rmll.info/?lang=en I already have the nareto project and oreon project who's going to speak Everybody is invited to take part and we are happy if we would get flooded by as many Papers as possible to make an even better Âas last year :). Have a nice day -- Benoit Mortier CEO www.opensides.be Contributor to Gosa Project : http://gosa.gonicus.de/ Contributeur to Nagios Plugins : http://nagiosplug.sourceforge.net/ ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________________ Nagios Plugin Development Mailing List Nagiosplug-devel@xxxxxxxxxxxxxxxxxxxxx Unsubscribe at https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel ::: Please include plugins version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
blog comments powered by Disqus

Home | News | Sitemap | FAQ | advertise | OSDir is an Inevitable website. GBiz is too!