On Jan 30, 2007, at 1:34 PM, Tony Schreiner wrote:
I am wanting to upgrade a cluster from Fedora 4 to Fedora 6, but am
hung up on the OSC mpiexec part.
I have torque 2.1.6-1 from the Fedora repo installed.
mpiexec compiles fine, I used
./configure ---with-default-comm=mpich-p4
my script, dompi is basically
/path/to/mpiexec ./app
I submit the dompi script, with
qsub -l nodes=nodeX dompi
on the node I upgraded (node5), I get in the error log
mpiexec: Error: get_hosts: pbs_connect: no error.
and this is because pbs_connect(0) in get_hosts.c returns -1 for
me on this node, I guess it's supposed to return the number of
available nodes.
It still works on the other ones though.
Some sort of host resolution error? Everything seems fine to me.
If I may answer my own question. I got the vital clue from Pete
Wyckoff at OSC. The error pointed to problems with the pbs_iff program.
I had installed the torque, torque-mom and libtorque RPMs from
Fedora, but had not installed torque-client which is where pbs_iff is
found. After I corrected that the problem was solved.
Tony Schreiner
|