logo       


Updated: killbaduser, a tool to clean up rogue user processes: msg#00227

Subject: Updated: killbaduser, a tool to clean up rogue user processes
Dear Torque users,

We've been using killbaduser, a tool to clean up rogue user processes,
for a while now and it seems to do the job well.  I've made some
minor improvements to the bash script "killbaduser" version 1.3
(attached file, or available from ftp://ftp.fysik.dtu.dk/pub/PBS/).

This script should be executed on each individual Torque compute node,
either from a cron job, perhaps in the job prologue script (?), or from
the master server in a loop over all compute nodes.

--
Ole Holm Nielsen
Department of Physics, Technical University of Denmark
#!/bin/sh

#
# On a Torque/PBS compute node, list and kill any user processes not belonging 
to batch jobs.
#
# Usage: killbaduser [-k] [-s] [-v]
#    -k will execute the kill command 
#    -s will sleep a random number of seconds so the pbs_server doesn't get 
overloaded
#    -v verbose output for debugging
# Author: Ole Holm Nielsen, Department of Physics, Technical University of 
Denmark
# Version: 1.3
#

###  CONFIGURE:  ###
# The list of OK system user-ids:
USERLIST="root rpc rpcuser daemon ntp smmsp sshd hpsmh named dbus"
# Don't kill processes with UID < UIDMIN
UIDMIN=250

###  CONFIGURE:  ###
# Commands which we use:
PBSNODES=/usr/local/bin/pbsnodes
QSTAT=/usr/local/bin/qstat

#
# Process command options
#
DOKILL=0
DOSLEEP=0
VERBOSE=0
while getopts "ksv" options; do
        case $options in
                k ) DOKILL=1;;
                s ) DOSLEEP=1;;
                v ) VERBOSE=1;;
                * ) echo Usage: $0 "[-k] [-s] [-v]"
                        exit 1;;
        esac
done

# Get the Torque nodename for this node.
# Strip the domain name (would be nice if there existed a Torque function for 
the current nodename)
NODENAME=`echo $HOSTNAME | awk -F. '{print $1}'`
if test ${VERBOSE} -eq 1
then
        echo This node has name: $NODENAME
fi

#
# Sleep a random number of seconds so Torque server doesn't get overloaded
# if all nodes run this script simultaneously.
#
if test ${DOSLEEP} -eq 1
then
        # Initialize /bin/bash built-in random number generator with PID
        RANDOM=$$
        MAXSLEEP=10
        INTERVAL=$(($RANDOM % $MAXSLEEP))
        if test ${VERBOSE} -eq 1
        then
                echo Sleeping $INTERVAL seconds
        fi
        sleep $INTERVAL
fi

#
# Get job list on this node and write one line for each unique job.
# Redirect stderr for pbsnodes because it complains if this node isn't part of 
the cluster.
#
JOBLIST=`$PBSNODES -a $NODENAME 2>&1 | grep 'jobs = ' | sed -e s/,//g -e 's/    
 jobs = //' -e 's/[0-9]\///g' | tr ' ' '\n' | uniq`
if test ${VERBOSE} -eq 1
then
        echo Torque job list for node $NODENAME: $JOBLIST
fi

# Get batch job user-ids and append to USERLIST
for job in $JOBLIST
do
        # Get the user-id from the Job_Owner attribute
        # (the "euser" variable seems to be unavailable on Torque compute 
nodes).
        EUSER=`$QSTAT -f $job | grep 'Job_Owner =' | awk '{print $3}' | awk -F@ 
'{print $1}'`
        if test ${VERBOSE} -eq 1
        then
                echo Job $job with user-id $EUSER
        fi
        USERLIST="$USERLIST $EUSER"
done
if test ${VERBOSE} -eq 1
then
        echo List of OK users: $USERLIST
fi

#
# Print the process list, deselecting acceptable user-ids.
#
if test ${VERBOSE} -eq 1
then
        echo List of rogue processes:
fi
PSFLAGS="--no-headers -o pid,state,uid,user,command"
ps --deselect -u "$USERLIST" $PSFLAGS

#
# Kill rogue user processes
#
if test ${DOKILL} -eq 1
then
        PIDLIST=`ps --deselect -u "$USERLIST" $PSFLAGS | awk -v UIDMIN=$UIDMIN '
        {
                PID=$1; UID=$3
                if (UID > $UIDMIN) PIDLIST = PIDLIST sprintf("%d ", PID)
        } END {
                if (length(PIDLIST) > 0) print PIDLIST
        }'`
        # Kill rogue processes, if any
        if test -n "$PIDLIST"
        then
                echo Killing rogue processes $PIDLIST
                # Troy Baer safe version: SIGCONT; sleep; SIGTERM; sleep; 
SIGKILL
                if test ${VERBOSE} -eq 1
                then
                        echo Sending CONT signal
                fi
                kill -s CONT $PIDLIST
                sleep 1
                if test ${VERBOSE} -eq 1
                then
                        echo Sending TERM signal
                fi
                kill -s TERM $PIDLIST
                sleep 5
                if test ${VERBOSE} -eq 1
                then
                        echo Sending KILL signal
                fi
                kill -s KILL $PIDLIST
        fi
fi
_______________________________________________
torqueusers mailing list
torqueusers@xxxxxxxxxxxxxxxx
http://www.supercluster.org/mailman/listinfo/torqueusers
Ruby Jobs
Java Jobs
Jobs in California
more...
what
job title, keywords
where
city, state, zip
jobs by job search
<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
encryption.gpg....    ietf.rfc822/199...    freebsd.devel.i...    lang.haskell.li...    mail.squirrelma...    web.zope.plone....    yellowdog.gener...    text.xml.xalan....    recreation.phot...    kde.devel.educa...    hardware.bus.ca...    printing.ghosts...    voip.peering/20...    assembly/2006-0...    org.user-groups...    culture.interne...    network.i2p/200...    boot-loaders.ya...    xfree86.render/...    qnx.openqnx.dev...    jakarta.velocit...    user-groups.pal...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe