|
|
Choosing A Webhost: |
MOM rejecting modify requests: msg#00078clustering.torque.user
I am continuing to have problems at one site where jobs seem to get sent to a compute node to run, but then the mom seems to lose track of them somehow and starts rejecting requests from the scheduler. Any idea what kind of things I should be checking? The logs don't give any clue _why_ the requests are being refused. This is what's in the mom_log: 05/23/2006 10:05:32;0080; pbs_mom;Req;req_reject;Reject reply code=15001(Unknown Job Id REJHOST=gridmon.cp.dias.ie MSG=modify job failed, unknown job 10332.gridgate.cp.dias.ie), aux=0, type=ModifyJob, from PBS_Server@xxxxxxxxxxxxxxxxxxx 05/23/2006 11:29:50;0080; pbs_mom;Req;req_reject;Reject reply code=15001(Unknown Job Id REJHOST=gridmon.cp.dias.ie MSG=modify job failed, unknown job 10332.gridgate.cp.dias.ie), aux=0, type=ModifyJob, from PBS_Server@xxxxxxxxxxxxxxxxxxx and the output of tracejob: Job: 10332.gridgate.cp.dias.ie 05/23/2006 09:34:50 S enqueuing into test, state 1 hop 1 05/23/2006 09:34:50 S Job Queued at request of dtes@xxxxxxxxxxxxxxxxxxx, owner = dtes@xxxxxxxxxxxxxxxxxxx, job name = STDIN, queue = test 05/23/2006 09:34:50 A queue=test 05/23/2006 10:05:32 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx 05/23/2006 10:05:32 S Job Run at request of root@xxxxxxxxxxxxxxxxxxx 05/23/2006 10:05:32 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx 05/23/2006 10:05:32 S MOM rejected modify request, error: 15001 05/23/2006 10:11:52 S enqueuing into test, state 1 hop 1 05/23/2006 10:11:52 S Requeueing job, substate: 37 Requeued in queue: test 05/23/2006 10:12:19 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx 05/23/2006 10:12:19 S Job Run at request of root@xxxxxxxxxxxxxxxxxxx 05/23/2006 10:12:19 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx 05/23/2006 10:12:19 S MOM rejected modify request, error: 15001 05/23/2006 10:42:26 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx 05/23/2006 10:42:26 S Job Run at request of root@xxxxxxxxxxxxxxxxxxx 05/23/2006 10:42:26 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx 05/23/2006 10:42:26 S MOM rejected modify request, error: 15001 05/23/2006 11:12:33 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx 05/23/2006 11:12:33 S Job Run at request of root@xxxxxxxxxxxxxxxxxxx 05/23/2006 11:12:33 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx 05/23/2006 11:12:33 S MOM rejected modify request, error: 15001 05/23/2006 11:24:26 S enqueuing into test, state 1 hop 1 05/23/2006 11:24:26 S Requeueing job, substate: 37 Requeued in queue: test 05/23/2006 11:29:50 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx 05/23/2006 11:29:50 S Job Run at request of root@xxxxxxxxxxxxxxxxxxx 05/23/2006 11:29:50 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx 05/23/2006 11:29:50 S MOM rejected modify request, error: 15001 -- Dr. Stephen Childs, Research Fellow, EGEE Project, phone: +353-1-6081797 Computer Architecture Group, email: Stephen.Childs @ cs.tcd.ie Trinity College Dublin, Ireland web: http://www.cs.tcd.ie/Stephen.Childs
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | New Maui Patch, Josh Butikofer |
|---|---|
| Next by Date: | Schedule and/or Mom problem, Albino Aveleda |
| Previous by Thread: | New Maui Patch, Josh Butikofer |
| Next by Thread: | Re: MOM rejecting modify requests, garrick |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |