|
|
Re: PBS_Server; Req; req_reject; Reject reply code=15001(Unknown Job Id), a: msg#00073
Hi, Now I submit the following RSL script(mpi job) via Globus to Torque: ------------------------------------------------------------------------------------------ + ( &(resourceManagerContact="
Server.eng4.shirazu.ac.ir/jobmanager-pbs") (count=2) (jobtype=mpi) (label="subjob 0") (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0) (LD_LIBRARY_PATH /usr/local/globus-
4.0.3/lib/)) (directory="/home/grid/globusTest/MPICH-G2") (executable="/home/grid/globusTest/MPICH-G2/ring") (stdout=TorqueOut) (stderr=TorqueErr) ) -------------------------------------------------------------------------------------------------------------------------------------------
According to Torque logs everything seems successful, But I got the following as output. There is no firewall in my environment. What's the problem? ------------------------------------------------------------TorqueOut----------------------------------------------------------------
Submission of subjob (label = "subjob 0") failed because the connection to the server failed (check host and port) (error code 62) Submission of subjob (label = "subjob 1") failed because the connection to the server failed (check host and port) (error code 62)
------------------------------------------------------------------------------------------------------------------------------------------On 5/11/07, Garrick Staples
<garrick@xxxxxxxxxxxxxxxxxxxx> wrote:
On Fri, May 11, 2007 at 10:23:32AM +0330, Mehdi Sheikhalishahi alleged: > Hi, > I submit the following simple RSL script via Globus to Torque. > ----------------------------------------------------------------------------------------
> + > ( &(resourceManagerContact="Server.eng4.shirazu.ac.ir/jobmanager-pbs") > (count=2) > (label="subjob 0")
> (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0) > (LD_LIBRARY_PATH /usr/local/globus-4.0.3/lib/)) > (directory="/home/grid/Torque/simple") > (executable="/bin/hostname")
> (stdout=hostnameOutput) > (stderr=hostnameError) > ) > --------------------------My Report-------------------------- > Job was successfully executed, > --------------------------hostnameOutput------------------------------
> localhost009 > localhost008 > ---------------------------------------------------------------------------------------- > But there are some strange errors messages on Server logs. The following are
> strange errors: > > 05/11/2007 09:38:52;0080;PBS_Server;Req;req_reject;Reject reply > code=15001(Unknown Job Id), aux=0, type=StatusJob, from >
grid@xxxxxxxxxxxxxxxxxxxxxxxxx > 05/11/2007 09:38:52;0100;PBS_Server;Req;;Type AuthenticateUser request > received from grid@xxxxxxxxxxxxxxxxxxxxxxxxx, sock=15
> 05/11/2007 09:38:52;0100;PBS_Server;Req;;Type LocateJob request received > from grid@xxxxxxxxxxxxxxxxxxxxxxxxx, sock=14 > 05/11/2007 09:38:52;0080;PBS_Server;Req;req_reject;Reject reply
> code=15001(Unknown Job Id), aux=0, type=LocateJob, from > grid@xxxxxxxxxxxxxxxxxxxxxxxxx
This is normal for globus. The globus job manager is looping over
'qstat $jobid' waiting for the job to end. At some point, qstat returns the "unknown job id" error and globus knows the job has exited.
_______________________________________________ torqueusers mailing list
torqueusers@xxxxxxxxxxxxxxxx http://www.supercluster.org/mailman/listinfo/torqueusers
-- Best Regards, S.Mehdi Sheikhalishahi, Web: http://www.cse.shirazu.ac.ir/~alishahi/ Bye.
_______________________________________________
torqueusers mailing list
torqueusers@xxxxxxxxxxxxxxxx
http://www.supercluster.org/mailman/listinfo/torqueusers
|
| |