logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

Re: $PBS_NODEFILE: msg#00086

Subject: Re: $PBS_NODEFILE
Hi Aaron,
  Something happened with the pbs_server, pbs_scheduler or pbs_mom.
We restarted all three and it started working again.

but to answer your question.
#PBS -l walltime=00:10:00
#PBS -l nodes=2:pinecone:ppn=2

It would run a two processor job or a serial job but only on the first
compute node which in this case pc02.  The $PBS_NODEFILE with the above
resource request would only have "pc02 pc02" in it.


Chris Bording
Application Analyst
High Performance Computing Group Information Technology
The College of William and Mary
(757)-221-3488
rcbord@xxxxxx

On Wed, 24 Oct 2007, Aaron Knister wrote:

So jobs aren't running? What syntax are you using to request resources for submitted jobs?

-Aaron

On Oct 24, 2007, at 11:18 AM, rcbord@xxxxxx wrote:

Hi all,
Ok we installed torque-2.1.9 and had it running on Monday, but now it is not working correctly. The $PBS_NODEFILE only add has two processors in it.

The server_priv/node file has

pc02 np=2
pc03 np=2
.
.
pc13 np=2

all the nodes are "free" according to the pbsnodes -a output

pc12
     state = free
     np = 2
     properties = pinecone,v20z,score
     ntype = cluster
status = opsys=linux,uname=Linux pc12 2.6.16.53-0.8-smp #1 SMP Fri Aug 31 13:07:27 UTC 2007 x86_64,sessions=4144,nsessions=1,nusers=1,idletime=167833,totmem=79389 36kb, availmem=7732980kb,physmem=3737980kb,ncpus=2,loadave=0.00,netload=2475 13913,
state=free,jobs=? 15201,rectime=1193238080


#
# Set server attributes.
#
set server scheduling = True
set server default_queue = submit
set server log_events = 127
set server mail_from = adm
set server max_running = 24
set server max_user_run = 24
set server max_group_run =24
set server acl_host_enable = True
set server acl_hosts = pinecone.cwm.edu
set server acl_hosts += pc02
set server acl_hosts += pc03
set server acl_hosts += pc04
set server acl_hosts += pc05
set server acl_hosts += pc06
set server acl_hosts += pc07
set server acl_hosts += pc08
set server acl_hosts += pc09
set server acl_hosts += pc10
set server acl_hosts += pc11
set server acl_hosts += pc12
set server acl_hosts += pc13
set server query_other_jobs = True
set server acl_roots = root@xxxxxxxxxxxxxxxx
set server managers = manager1@xxxxxxxxxxxxxxxx
set server operators = manager2@xxxxxxxxxxxxxxxx
set server operators += manager1@xxxxxxxxxxxxxxxx
set server resources_available.ncpus = 24
set server resources_available.nodect = 12
set server resources_max.nodect = 12
set server scheduler_iteration = 30
set server node_check_rate = 150
set server tcp_timeout = 6
set server log_level = 5
set server pbs_version = 2.1.9

I don't think we have changed anything it just quit!!


Chris Bording
Application Analyst
High Performance Computing Group Information Technology
The College of William and Mary
(757)-221-3488
rcbord@xxxxxx
_______________________________________________
torqueusers mailing list
torqueusers@xxxxxxxxxxxxxxxx
http://www.supercluster.org/mailman/listinfo/torqueusers

Aaron Knister
Associate Systems Administrator/Web Designer
Center for Research on Environment and Water

(301) 595-7001
aaron@xxxxxxxx






<Prev in Thread] Current Thread [Next in Thread>