|
|
Choosing A Webhost: |
Re: scp error: msg#00135clustering.torque.user
On Thu, Nov 30, 2006 at 03:15:44PM +0100, LEROY Christine alleged: > Hello, > > > > We are using torque and maui beside our grid middleware, and users are > complaining that there jobs are sometimes failing with no output. > > We had a look in our logs and we can see those errors: > > > > Nov 30 02:18:31 wn021 pbs_mom: sys_copy, command '/usr/bin/scp -rpB > /var/spool/pbs/spool/87831.node0.OU > atlp@xxxxxxxxxxxxxxxxxxxxxx:/home/atlp/.lcgjm/globus-cache-export.Y30406 > /batch.out' failed with status=1, giving up after 4 attempts > > Nov 30 02:18:36 wn021 pbs_mom: sys_copy, command '/usr/bin/scp -rpB > /var/spool/pbs/spool/87831.node0.ER > atlp@xxxxxxxxxxxxxxxxxxxxxx:/home/atlp/.lcgjm/globus-cache-export.Y30406 > /batch.err' failed with status=1, giving up after 4 attempts > > > > (node07.datagrid.cea.fr is our pbs server, and wn021 is one of our nodes > where pbs_mom is running) > > > > Are those file "/var/spool/pbs/spool/87831.node0.OU" and > "/var/spool/pbs/spool/87831.node0.ER " deleted too soon by the system on > the pbs_mom node? > > Or is it possible to configure the number of attempts ? > > > > Thanks in advance for your help. > > Cheers > > Christine > > > > > > PS : We have also the same type of error but at the beginning of the job > : > > > > Nov 30 04:40:21 wn021 pbs_mom: sys_copy, command '/usr/bin/scp -rpB > fus176@xxxxxxxxxxxxxxxxxxxxxx:/home/fus176/.lcgjm/globus-cache-export.g2 > 2960/globus-cache-export.g22960.gpg globus-cache-export.g22960.gpg' > failed with status=1, giving up after 4 attempts > The number of tries isn't configurable, and IMHO doesn't need to be because, generally speaking, any failure will just repeat until it gives up. Meaning that 4 tries is as good as 1 try, and is as good as 50 tries. Make sure you are on 2.1.6, 2.1.4 and 2.1.5 have some things broken in this area. Since this is likely an ssh configuration error, the exact error message should have been sent to the user in an email. If /home is shared on your cluster, add suitable $usecp lines to your MOM config so that scp isn't used anymore.
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | qsub rules for converting partial job ids?, Michael Durket |
|---|---|
| Next by Date: | Re: qsub rules for converting partial job ids?, Garrick Staples |
| Previous by Thread: | scp error, LEROY Christine |
| Next by Thread: | Re: scp error, Chris Samuel |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |