logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

start intel mpi in pbs: msg#00085

Subject: start intel mpi in pbs

Hi all,

In the pbs script file I can’t start the mpd (intel mpi ) useing the following command

****************************************************************************

mpdboot  --rsh=ssh -v -n `cat mpd.hosts|wc -l`  -f mpd.hosts

****************************************************************************

It gives:

--------------------------------------------------------------------------------------------------

totalnum=4  numhosts=3

there are not enough hosts on which to start all processes

--------------------------------------------------------------------------------------------------

But I can manually start mpd using the same command.

-------------------------------------------------------------------------------------------------

[mpp@cluster std]$  mpdboot --rsh=ssh -v -n 4 -f mpd.hosts

running mpdallexit on cluster

LAUNCHED mpd on cluster  via

RUNNING: mpd on cluster

LAUNCHED mpd on c0-0  via  cluster

LAUNCHED mpd on c0-1  via  cluster

LAUNCHED mpd on c0-2  via  cluster

RUNNING: mpd on c0-0

RUNNING: mpd on c0-1

RUNNING: mpd on c0-2

-------------------------------------------------------------------------------------------------

 

Does any one know how to fix? Many thanks!

Best wishes,

Chaucer

 


发件人: Chaucer Cao [mailto:ccao@xxxxxxx]
发送时间: 2007年6月26 14:12
收件人: 'Krause, Roland'
主题: 答复: [torqueusers] how to get Environment Variables

 

Hi Roland,

Maybe the pbsnodes give the ntype cluster info. You :

c0-2

     state = free

     np = 4

     ntype = cluster

     status = opsys=linux,uname=Linux compute-0-2.local 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 13:38:27 BST 2006 x86_64,sessions=14316,nsessions=1,nusers=1,idletime=105210,totmem=5045676kb,availmem=4608468kb,physmem=4025560kb,ncpus=4,loadave=4.00,netload=483100398328,state=free,jobs=,varattr=,rectime=1182836318

 

c0-1

     state = free

     np = 4

     ntype = cluster

     status = opsys=linux,uname=Linux compute-0-1.local 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 13:38:27 BST 2006 x86_64,sessions=26709,nsessions=1,nusers=1,idletime=234995,totmem=5045672kb,availmem=4592532kb,physmem=4025556kb,ncpus=4,loadave=4.00,netload=697953068235,state=free,jobs=,varattr=,rectime=1182836316

 

c0-0

     state = free

     np = 4

     ntype = cluster

     status = opsys=linux,uname=Linux compute-0-0.local 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 13:38:27 BST 2006 x86_64,sessions=28348,nsessions=1,nusers=1,idletime=220618,totmem=5045676kb,availmem=4557852kb,physmem=4025560kb,ncpus=4,loadave=4.00,netload=588068945521,state=free,jobs=,varattr=,rectime=1182836318

 

cluster

     state = free

     np = 4

     ntype = cluster

     status = opsys=linux,uname=Linux cluster.hpc.org 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 13:38:27 BST 2006 x86_64,sessions=2993 24894 25052 25158 25307,nsessions=5,nusers=3,idletime=92734,totmem=5045676kb,availmem=4130016kb,physmem=4025560kb,ncpus=4,loadave=4.48,netload=678702222035,state=free,jobs=,varattr=,rectime=1182836315

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

It seems the head node get the different domain. In the /etc/hosts

#

# Do NOT Edit (generated by dbreport)

#

127.0.0.1       localhost.localdomain   localhost

10.1.1.1        cluster.local cluster # originally frontend-0-0

10.255.255.254  compute-0-0.local compute-0-0 c0-0

10.255.255.253  compute-0-1.local compute-0-1 c0-1

10.255.255.252  compute-0-2.local compute-0-2 c0-2

192.168.1.1     cluster.hpc.org

But I don’t how tell the pbs_server he should use the cluster.local. J thanks!

Best wishes,

Chaucer

 

 


发件人: Krause, Roland [mailto:Roland.Krause@xxxxxxxxxxxxxxxx]
发送时间: 2007年6月25 19:37
收件人: Chaucer Cao
主题: RE: [torqueusers] how to get Environment Variables

 

Hi Chaucer,

 

beside our production system we have a test system with two nodes. One of them is server,

but I can run jobs with qsub -l nodes=2.

Do all your nodes have the "ntype" "cluster"?

 

Regards,

Roland

 


From: Chaucer Cao [mailto:ccao@xxxxxxx]
Sent: Monday, June 25, 2007 10:33 AM
To: Krause, Roland
Subject: ??: [torqueusers] how to get Environment Variables

Hi Roland,

The Environment variables problem is OK now. but I encounter another problem:

There are four nodes including the head node. But I only can submit 3-node job by qsub. When I submit a 4-node job it gives:

c0-0

c0-1

c0-2

cluster

totalnum=4  numhosts=3

there are not enough hosts on which to start all processes

  1. no mpd is running on this host

  2. an mpd is running but was started without a "console" (-n option)

mpdtrace: cannot connect to local mpd (/tmp/mpd2.console_ccao); possible causes:

mpdallexit: cannot connect to local mpd (/tmp/mpd2.console_ccao); possible causes:

  1. no mpd is running on this host

  2. an mpd is running but was started without a "console" (-n option)

It seems I can’t run the job on head node(cluster) with pbs. But I can run 4-node job directly (without qsub).

When I use pbsnodes to check it seems all nodes are in free status. Can you help me on this? Many thanks!

Best wishes,

Chaucer

 

 


发件人: Krause, Roland [mailto:Roland.Krause@xxxxxxxxxxxxxxxx]
发送时间: 2007年6月25 15:13
收件人: Chaucer Cao
主题: RE: [torqueusers] how to get Environment Variables

 

Hi Chaucer,

 

Could you provide  the part of your script, which is reading PBS env variables?

 

Regards,

Roland

 


From: torqueusers-bounces@xxxxxxxxxxxxxxxx [mailto:torqueusers-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Chaucer Cao
Sent: Wednesday, June 20, 2007 7:16 PM
To: torqueusers@xxxxxxxxxxxxxxxx
Subject: [torqueusers] how to get Environment Variables

Hi all,

Does any one know how can I get the the PBS environment variables in the run script file. When I qsub my script file it gives:

PBS_NODEFILE: Undefined variable.

PBS_ENVIRONMENT: Undefined variable.

Many thanks!

Chaucer

_______________________________________________
torqueusers mailing list
torqueusers@xxxxxxxxxxxxxxxx
http://www.supercluster.org/mailman/listinfo/torqueusers
<Prev in Thread] Current Thread [Next in Thread>