logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

Re: PBS_Server; Req; req_reject; Reject reply code=15001(Unknown Job Id), a: msg#00073

Subject: Re: PBS_Server; Req; req_reject; Reject reply code=15001(Unknown Job Id), aux=0, type=LocateJob, from grid@xxxxxxxxxxxxxxxxxxxxxxxxx
Hi,
 Now I submit the following RSL script(mpi job) via Globus to Torque:
------------------------------------------------------------------------------------------
+
( &(resourceManagerContact=" Server.eng4.shirazu.ac.ir/jobmanager-pbs")
   (count=2)
   (jobtype=mpi)
   (label="subjob 0")
   (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
                (LD_LIBRARY_PATH /usr/local/globus- 4.0.3/lib/))
   (directory="/home/grid/globusTest/MPICH-G2")
   (executable="/home/grid/globusTest/MPICH-G2/ring")
   (stdout=TorqueOut)
   (stderr=TorqueErr)
)
-------------------------------------------------------------------------------------------------------------------------------------------
According to Torque logs everything seems successful, But I got the following as output.
There is no firewall in my environment. What's the problem?
------------------------------------------------------------TorqueOut----------------------------------------------------------------
    Submission of subjob (label = "subjob 0") failed because the connection to the server failed (check host and port) (error code 62)
    Submission of subjob (label = "subjob 1") failed because the connection to the server failed (check host and port) (error code 62)
------------------------------------------------------------------------------------------------------------------------------------------
On 5/11/07, Garrick Staples <garrick@xxxxxxxxxxxxxxxxxxxx> wrote:
On Fri, May 11, 2007 at 10:23:32AM +0330, Mehdi Sheikhalishahi alleged:
> Hi,
> I submit the following simple RSL script via Globus to Torque.
> ----------------------------------------------------------------------------------------
> +
> ( &(resourceManagerContact="Server.eng4.shirazu.ac.ir/jobmanager-pbs")
>   (count=2)
>   (label="subjob 0")
>   (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
>                (LD_LIBRARY_PATH /usr/local/globus-4.0.3/lib/))
>   (directory="/home/grid/Torque/simple")
>   (executable="/bin/hostname")
>   (stdout=hostnameOutput)
>   (stderr=hostnameError)
> )
> --------------------------My Report--------------------------
> Job was successfully executed,
> --------------------------hostnameOutput------------------------------
> localhost009
> localhost008
> ----------------------------------------------------------------------------------------
> But there are some strange errors messages on Server logs. The following are
> strange errors:
>
> 05/11/2007 09:38:52;0080;PBS_Server;Req;req_reject;Reject reply
> code=15001(Unknown Job Id), aux=0, type=StatusJob, from
> grid@xxxxxxxxxxxxxxxxxxxxxxxxx
> 05/11/2007 09:38:52;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from grid@xxxxxxxxxxxxxxxxxxxxxxxxx, sock=15
> 05/11/2007 09:38:52;0100;PBS_Server;Req;;Type LocateJob request received
> from grid@xxxxxxxxxxxxxxxxxxxxxxxxx, sock=14
> 05/11/2007 09:38:52;0080;PBS_Server;Req;req_reject;Reject reply
> code=15001(Unknown Job Id), aux=0, type=LocateJob, from
> grid@xxxxxxxxxxxxxxxxxxxxxxxxx

This is normal for globus.  The globus job manager is looping over
'qstat $jobid' waiting for the job to end.  At some point, qstat returns
the "unknown job id" error and globus knows the job has exited.

_______________________________________________
torqueusers mailing list
torqueusers@xxxxxxxxxxxxxxxx
http://www.supercluster.org/mailman/listinfo/torqueusers



--
Best Regards,
S.Mehdi Sheikhalishahi,
Web: http://www.cse.shirazu.ac.ir/~alishahi/
Bye.
_______________________________________________
torqueusers mailing list
torqueusers@xxxxxxxxxxxxxxxx
http://www.supercluster.org/mailman/listinfo/torqueusers
<Prev in Thread] Current Thread [Next in Thread>