We have found a problem when trying to run jobs using tm when running
on only one node. Which is quite strange. If the MPI library (Lam
or OpenMPI) uses 2 nodes (nodes=2:ppn=2) the job will start just
fine. But if its 1 (nodes=1:ppn=2) the job can not start. This is
not a problem for serial jobs, we are also using the same versions
of torque and lam/openmpi on our linux cluster with no problems. If
i build a LAM without tm support the jobs run fine.
I dug the archives and i found some references to a similar problem.
Im just wondering what i should do to test it or if this is a known
problem on OSX ? The systems are running 10.3 on G5's, its using
torque-2.1.6.
Thanks.
Brock Palen
Center for Advanced Computing
brockp@xxxxxxxxx
(734)936-1985
|