We
are preparing to implement the rules below to 'touch' a file each XXX seconds
if events were received. To know if events were received we are counting every
new event. I didn't want to have to count every event, but I have
observed situations where events quit being processed (e.g., if we lose DB
comm) and the timer still fires.
While the file is being 'touched' from the rules, the
crontab is running a perl script that does a stat() on the file to see how
long since it was last modified. I.e.,
...
$now=time();
($device, $inode, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime,
$ctime, $blksize, $blocks) = stat($Tec_check_file);
$diff=$now-$mtime;
if ($diff > $threshold)
{
do_something; # send emails, pages
}
...
(I
haven't looked at the other methods posted yet - they might be more efficient
than mine)
-James
--------------------------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%
This rule counts each event. Used by tec_hb.
%
rule:
init_count_each_evt: (
event: _event of_class
_class,
reception_action:
(
get_global_var('TEC_HB', 'COUNT',
_old_count, 0),
_new_count is _old_count +
1,
set_global_var('TEC_HB', 'COUNT',
_new_count)
)
).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%
This rule schedules a tec_hb timer.
%
rule:
init_tec_start_schedule_hb_timer: (
event: _event of_class
'TEC_Start',
reception_action:
(
get_global_var('TEC_HB', 'TIMER_STARTED',
_started, 'NOPE'),
_started ==
'NOPE',
set_global_var('TEC_HB',
'TIMER_STARTED', 'YEP'),
first_instance(event: _tic of_class 'TEC_Tick' where
[]),
set_timer(_tic, 30,
'tec_hb')
)
).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%
Every _duration seconds we'll 'touch' a file for tec hb.
%
timer_rule:
tec_hb_touch: (
event: _tic of_class 'TEC_Tick' where
[],
timer_info: equals 'tec_hb',
timer_duration: _duration,
action: (
get_global_var('TEC_HB', 'COUNT', _count,
0),
get_global_var('TEC_HB', 'LAST_COUNT',
_last_count, 0),
set_global_var('TEC_HB',
'LAST_COUNT', _count),
_interval_count is
_count - _last_count,
% only continue w/
'touch' if _interval_count > 0.
_interval_count > 0,
get_local_time(_time_local_struct),
resolve_time(_time_local_struct, _seconds, _minutes, _hours, _day_of_month,
_month0, _year0, _day_of_week, _day_of_year,
_daylight_savings),
_year4 is _year0 +
1900,
_month is _month0 +
1,
sprintf(_log_entry,
'%04d-%02d-%02d/%02d:%02d:%02d Events(Total/Interval):%d/%d', [_year4, _month,
_day_of_month,_hours,_minutes,_seconds, _count, _interval_count
]),
% Probably want to change file mode from
a->w.
fopen(_hbfile,
'/Tivoli/custom/log/dm_hb/heartbeat.tec',
a),
fprintf(_hbfile,'%s\n',[_log_entry]),
fclose(_hbfile)
),
action:
(
set_timer(_tic, _duration,
'tec_hb')
)
).
Hi
list,
Just
curious...
How do you
monitor the availability of your Tivoli environment?
When you have a
single TMR environment with a separated TMR- and TECserver, your automated
incident registration is connected to you TMR. Then the monitoring of the
availability of your TEC is essential. What we need is an indication in case
of unvailability of the TEC server.
When a TEC
server is shutdown using the wstopesvr command a TEC_Stop event is generated
which is visible on the TEC console. In this case you will get a
notification that the eventserver is unavailable. In the sitiuation when the
tec_* processes are killed (or aborted by a coredump) or the eventserver
gets overflooded by events the console is unable to detect the
unavailability.
This is because
the TEC (java) console queries the DB directly and does not communicate with
the tec_ui_server when no modifications are made to the interface by human
intervention (acknowledgement / closing).
Has anyone found
the ultimate solution, or does anyone know about future developments
concerning TEC Console which wil deal with this problem?
Cheers,
Peter
================================================
De informatie
opgenomen in dit bericht kan vertrouwelijk zijn en
is uitsluitend
bestemd voor de geadresseerde. Indien u dit bericht
onterecht ontvangt,
wordt u verzocht de inhoud niet te gebruiken en
de afzender direct te
informeren door het bericht te retourneren.
================================================
The information
contained in this message may be confidential
and is intended to be
exclusively for the addressee. Should you
receive this message
unintentionally, please do not use the contents
herein and notify the
sender immediately by return
e-mail.