Please take our Survey
logo       

Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...

MOM rejecting modify requests: msg#00078

clustering.torque.user

Subject: MOM rejecting modify requests

I am continuing to have problems at one site where jobs seem to get sent to a compute node to run, but then the mom seems to lose track of them somehow and starts rejecting requests from the scheduler. Any idea what kind of things I should be checking? The logs don't give any clue _why_ the requests are being refused.

This is what's in the mom_log:

05/23/2006 10:05:32;0080; pbs_mom;Req;req_reject;Reject reply code=15001(Unknown Job Id REJHOST=gridmon.cp.dias.ie MSG=modify job failed, unknown job 10332.gridgate.cp.dias.ie), aux=0, type=ModifyJob, from PBS_Server@xxxxxxxxxxxxxxxxxxx
05/23/2006 11:29:50;0080; pbs_mom;Req;req_reject;Reject reply code=15001(Unknown Job Id REJHOST=gridmon.cp.dias.ie MSG=modify job failed, unknown job 10332.gridgate.cp.dias.ie), aux=0, type=ModifyJob, from PBS_Server@xxxxxxxxxxxxxxxxxxx

and the output of tracejob:


Job: 10332.gridgate.cp.dias.ie

05/23/2006 09:34:50 S enqueuing into test, state 1 hop 1
05/23/2006 09:34:50 S Job Queued at request of dtes@xxxxxxxxxxxxxxxxxxx,
owner = dtes@xxxxxxxxxxxxxxxxxxx, job name = STDIN,
queue = test
05/23/2006 09:34:50 A queue=test
05/23/2006 10:05:32 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx
05/23/2006 10:05:32 S Job Run at request of root@xxxxxxxxxxxxxxxxxxx
05/23/2006 10:05:32 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx
05/23/2006 10:05:32 S MOM rejected modify request, error: 15001
05/23/2006 10:11:52 S enqueuing into test, state 1 hop 1
05/23/2006 10:11:52 S Requeueing job, substate: 37 Requeued in queue: test
05/23/2006 10:12:19 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx
05/23/2006 10:12:19 S Job Run at request of root@xxxxxxxxxxxxxxxxxxx
05/23/2006 10:12:19 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx
05/23/2006 10:12:19 S MOM rejected modify request, error: 15001
05/23/2006 10:42:26 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx
05/23/2006 10:42:26 S Job Run at request of root@xxxxxxxxxxxxxxxxxxx
05/23/2006 10:42:26 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx
05/23/2006 10:42:26 S MOM rejected modify request, error: 15001
05/23/2006 11:12:33 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx
05/23/2006 11:12:33 S Job Run at request of root@xxxxxxxxxxxxxxxxxxx
05/23/2006 11:12:33 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx
05/23/2006 11:12:33 S MOM rejected modify request, error: 15001
05/23/2006 11:24:26 S enqueuing into test, state 1 hop 1
05/23/2006 11:24:26 S Requeueing job, substate: 37 Requeued in queue: test
05/23/2006 11:29:50 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx
05/23/2006 11:29:50 S Job Run at request of root@xxxxxxxxxxxxxxxxxxx
05/23/2006 11:29:50 S Job Modified at request of root@xxxxxxxxxxxxxxxxxxx
05/23/2006 11:29:50 S MOM rejected modify request, error: 15001


--
Dr. Stephen Childs,
Research Fellow, EGEE Project, phone: +353-1-6081797
Computer Architecture Group, email: Stephen.Childs @ cs.tcd.ie
Trinity College Dublin, Ireland web: http://www.cs.tcd.ie/Stephen.Childs


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
hardware.arm.at...    cms.citadel.dev...    video.gstreamer...    java.facelets.u...    misc.basics.qna...    web.wiki.instik...    network.uip.use...    xdg.devel/2003-...    tex.bibtex.bibd...    finance.quotesp...    ietf.zeroconf/2...    redhat.blinux.g...    suse.db2/2003-0...    php.phpesp/2004...    uml.devel/2003-...    gnome.labyrinth...    qnx.openqnx.dev...    boot-loaders.gr...    db.dataperfect....    audio.audacity....    linux.uclinux.m...    editors.j.devel...    os.openbsd.tech...    kde.users.multi...   
Home | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe

Navigation