logo       

RE: FW: [TEC] How to monitor the availability of TEC: msg#00470

sysutils.tivoli.tme10

Subject: RE: FW: [TEC] How to monitor the availability of TEC

We are preparing to implement the rules below to 'touch' a file each XXX seconds if events were received. To know if events were received we are counting every new event.  I didn't want to have to count every event, but I have observed situations where events quit being processed (e.g., if we lose DB comm) and the timer still fires. 
 
While the file is being 'touched' from the rules, the crontab is running a perl script that does a stat() on the file to see how long since it was last modified.  I.e.,
 
...
    $now=time();
    ($device, $inode, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks) = stat($Tec_check_file);
    $diff=$now-$mtime;
    if ($diff > $threshold) {
        do_something;    # send emails, pages
    }
...
 
(I haven't looked at the other methods posted yet - they might be more efficient than mine)
 
 
-James
 
 
--------------------------------------------------
 
 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% This rule counts each event. Used by tec_hb.
%
rule: init_count_each_evt: (
   event: _event of_class _class,
 
   reception_action: (
      get_global_var('TEC_HB', 'COUNT', _old_count, 0),
      _new_count is _old_count + 1,
      set_global_var('TEC_HB', 'COUNT', _new_count)
   )
).
 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% This rule schedules a tec_hb timer.
%
rule: init_tec_start_schedule_hb_timer: (
   event: _event of_class 'TEC_Start',
 
   reception_action: (
      get_global_var('TEC_HB', 'TIMER_STARTED', _started, 'NOPE'),
      _started == 'NOPE',
      set_global_var('TEC_HB', 'TIMER_STARTED', 'YEP'),
      first_instance(event: _tic of_class 'TEC_Tick' where []),
      set_timer(_tic, 30, 'tec_hb')
   )
).
 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Every _duration seconds we'll 'touch' a file for tec hb.
%
timer_rule: tec_hb_touch: (
   event: _tic of_class 'TEC_Tick' where [],
   timer_info: equals 'tec_hb',
   timer_duration: _duration,
 
   action: (
 
      get_global_var('TEC_HB', 'COUNT', _count, 0),
      get_global_var('TEC_HB', 'LAST_COUNT', _last_count, 0),
      set_global_var('TEC_HB', 'LAST_COUNT', _count),
      _interval_count is _count - _last_count,
 
      % only continue w/ 'touch' if _interval_count > 0.
      _interval_count > 0,
 
      get_local_time(_time_local_struct),
      resolve_time(_time_local_struct, _seconds, _minutes, _hours, _day_of_month, _month0, _year0, _day_of_week, _day_of_year, _daylight_savings),
      _year4 is _year0 + 1900,
      _month is _month0 + 1,
 
      sprintf(_log_entry, '%04d-%02d-%02d/%02d:%02d:%02d Events(Total/Interval):%d/%d', [_year4, _month, _day_of_month,_hours,_minutes,_seconds, _count, _interval_count ]),
      % Probably want to change file mode from a->w.
      fopen(_hbfile, '/Tivoli/custom/log/dm_hb/heartbeat.tec', a),
      fprintf(_hbfile,'%s\n',[_log_entry]),
      fclose(_hbfile)
   ),
 
   action: (
      set_timer(_tic, _duration, 'tec_hb')
   )
 
).
 
 
 
 
 
 
 
 
 
 
-----Original Message-----
From: owner-tme10@xxxxxxxxxxxxxxxx [mailto:owner-tme10@xxxxxxxxxxxxxxxx]On Behalf Of Nes van, P (Peter)
Sent: Tuesday, March 22, 2005 5:32 AM
To: tme10@xxxxxxxxxxxxxxxx
Subject: [tme10] FW: [TEC] How to monitor the availability of TEC

Hi list,
 
Just curious...
 
How do you monitor the availability of your Tivoli environment?
 
When you have a single TMR environment with a separated TMR- and TECserver, your automated incident registration is connected to you TMR. Then the monitoring of the availability of your TEC is essential. What we need is an indication in case of unvailability of the TEC server.
When a TEC server is shutdown using the wstopesvr command a TEC_Stop event is generated which is visible on the TEC console. In this case you will get a notification that the eventserver is unavailable. In the sitiuation when the tec_* processes are killed (or aborted by a coredump) or the eventserver gets overflooded by events the console is unable to detect the unavailability.
This is because the TEC (java) console queries the DB directly and does not communicate with the tec_ui_server when no modifications are made to the interface by human intervention (acknowledgement / closing).
 
Has anyone found the ultimate solution, or does anyone know about future developments concerning TEC Console which wil deal with this problem?
 
Cheers,
 
 
Peter
 
 
 

================================================
De informatie opgenomen in dit bericht kan vertrouwelijk zijn en
is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht
onterecht ontvangt, wordt u verzocht de inhoud niet te gebruiken en
de afzender direct te informeren door het bericht te retourneren.
================================================
The information contained in this message may be confidential
and is intended to be exclusively for the addressee. Should you
receive this message unintentionally, please do not use the contents
herein and notify the sender immediately by return e-mail.

<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise