On Fri, 2006-12-29 at 11:40 -0500, Tim Miller wrote:
> I'm running Torque 2.1.4. I would like all of the nodes and desktop
> computers on our internal network to be able to submit jobs, but only
> some of them are able to and I'm not seeing why.
>
> My setup is simple; a single routing queue that feeds into a single
> execution queue. The queues are configured as follows:
>
> routing:
> Queue entry
> queue_type = Route
> total_jobs = 0
> state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0
> Exiting:0
> acl_host_enable = False
> resources_default.nodes = 1:xeon306
> mtime = Fri Dec 29 11:19:27 2006
> route_destinations = xeon
> enabled = True
> started = True
>
> exec:
> Queue xeon
> queue_type = Execution
> total_jobs = 42
> state_count = Transit:0 Queued:1 Held:0 Waiting:0 Running:41
> Exiting:0
> acl_host_enable = False
> from_route_only = True
> mtime = Fri Dec 29 11:19:21 2006
> resources_assigned.nodect = 58
> enabled = True
> started = True
>
> Server setup:
> Server <name removed by me>
> server_state = Active
> scheduling = True
> total_jobs = 50
> state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:50
> Exiting:0
> managers = <manager list removed>
> default_queue = entry
> log_events = 511
> mail_from = adm
> query_other_jobs = True
> resources_assigned.nodect = 67
> scheduler_iteration = 600
> node_check_rate = 120
> tcp_timeout = 6
> pbs_version = 2.1.4
>
> As you can see, I've explicit set acl_host_enable to false on both
> queues. Nonetheless, when I try to submit a job from certain hosts I get
> a "job rejected by all possible destinations" and the following in the
> server log:
>
> 12/29/2006 11:20:22;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from tim@xxxxxxxxxxxxxxxx, sock=10
> 12/29/2006 11:20:22;0100;PBS_Server;Req;;Type QueueJob request received
> from tim@xxxxxxxxxxxxxxxx, sock=9
> 12/29/2006 11:20:22;0100;PBS_Server;Req;;Type ReadyToCommit request
> received from tim@xxxxxxxxxxxxxxxx, sock=9
> 12/29/2006 11:20:22;0100;PBS_Server;Req;;Type Commit request received
> from tim@xxxxxxxxxxxxxxxx, sock=9
> 12/29/2006 11:20:22;0080;PBS_Server;Req;req_reject;Reject reply
> code=15039(Job rejected by all possible destinations), aux=0,
> type=Commit, from tim@xxxxxxxxxxxxxxxx
>
> It looks like the job is never even assigned a number and rejected
> before it even hits the routing queue.
>
> I've scratched my head over this a little and just can't see what I'm
> doing wrong. Any ideas?
What does the job look like? It's hard to say why the job was rejected
without seeing what resources it requested.
--Troy
--
Troy Baer troy@xxxxxxx
Science & Technology Support http://www.osc.edu/hpc/
Ohio Supercomputer Center 614-292-9701
|
|