osdir.com
mailing list archive

Subject: You need a new watch - msg#00023

List: linux.debian.devel.beowulf

Date: Prev Next Index Thread: Prev Next Index
AIM ? Rolex, Cartier and Breitling
http://Ferguson.ju7.net/rolex/vron/brotherliness.html
You need a new watch





Was this page helpful?
Yes No
Thread at a glance:

Previous Message by Date: click to view message preview

Re: distributed batch processing

On Thu, May 12, 2005 at 01:46:24AM +0000, Andrew M.A. Cater wrote: Maybe, you want to check OAR (http://oar.imag.fr/). OAR is a resource manager or (batch scheduler) for large clusters. It's an alternative to PBS(+MAUI) , PBSpro, LSF, CCS or Condor. It's suitable for productive plateforms and research experiments. It's widely used, here in France. Gilles > On Wed, May 11, 2005 at 06:12:20PM -0700, Dale Southard wrote: > > > > On May 11, 2005, at 4:03 PM, Drake Diedrich wrote: > > > > >Debian really ought to have at least one packaged > > >implementation of this (DQS was, but I couldn't keep it working). > > > > As Josh mentioned, the Torque fork of OpenPBS is probably the > > best bet. Torque is OpenPBS + FOSS extensions, including > > re-implementation of PBSPro features. There are already some > > debs available and there's at least one script floating around > > to build deb packages. > > > > It's probably also worth looking at the maui scheduler (which > > runs on top of torque or other batch systems). Someone wiser > > than I should probably browse the licenses at > > www.clusterresources.com to verify that torque and/or maui > > meet the Debian standards before any work is started. > > > Debian Wiki on licence texts - OpenPBS and Torque both non-free. > (Though this may not be authoritative, it is certainly persuasive > evidence.) > > As someone who has actually downloaded OpenPBS from Altair for > business use - it was a nightmare and required pre-registration. > If you go there now, you are asked to fill out a webform - which > still recommends strongly that you get PBSPro: you may even have > to sign up for an evaluation licence for PBSPro in order to be > redirected to the OpenPBS download location. > > Torque is certainly more freely available - and has been packaged as a > .deb - but is not part of the Debian archive. Google for torque and .deb > and you should find it. The only reason it is non-free is because the > licence is a) controversial and b) unclear as to its effect. [Torque > took an earlier version of OpenPBS which appeared to be more liberal in > its licence terms and forked from there: it is unclear which licence > terms still apply - see the thread on debian-legal cross referenced from > the Wiki entry. > > It's a shame: I'm almost tempted to write a letter to Altair to suggest > that they should open source OpenPBS by removing the license restriction > and then dual-license PBSPro such that commercial customers can buy > support (in the same way that Aladdin did with Ghostscript et. al.). > > I'd also volunteer to package Torque if it were feasible to have it in > the distribution proper. > > Andy > Andy > > -- -- Gilles Fedak Researcher INRIA-Futurs http://XtremWeb.net http://www.lri.fr/~fedak

Next Message by Date: click to view message preview

Re: distributed batch processing

On Wed, May 11, 2005 at 06:12:20PM -0700, Dale Southard wrote: > > It's probably also worth looking at the maui scheduler (which > runs on top of torque or other batch systems). Someone wiser > than I should probably browse the licenses at > www.clusterresources.com to verify that torque and/or maui > meet the Debian standards before any work is started. > I've heard rumors of people plugging maui into SGE as well, but have never seen a writeup of it. My guess is it's not worth the work versus switching to PBS* if you really need maui. As I understand maui, it's best when you have large parallel jobs (say 128 nodes of a 512 node cluster), which it handles by reserving those blocks of nodes for the maximum expected runtime. To do that though there have to be time limits on all jobs, else maui schedules things infinitely far away. If jobs finish early, or there are gaps between scheduled runs, smaller jobs can backfill the schedule to increase utilization. Jobs that don't finish on time get killed. It's always been a complex solution for a problem I've never actually faced. Few people (3, only 1 actually running) in my cluster are writing parallel code at all, most don't know or don't care how long their programs will run (Monte Carlo and statistics tend to be like that). They just have lots of independent serial jobs to run. In contrast, SGE has a much simpler pair of schedulers (FIFO and user-sort from DQS). It also has subordinate queues, which work really well for suspending non-owner jobs (if parts of the cluster "belong" to certain users) and for long jobs (suspend whenever something else wants to run) and test/interactive jobs (higher priority, suspend everything lower, very limited runtimes). A stack of low priority serial jobs make getting 100% utilization easy. Subordinating part of a parallel job tends to be very non-productive though (making the rest of the job wait), so parallel jobs need to run in the higher priority queues to avoid idling large parts of the cluster. On Thu, May 12, 2005 at 01:46:24AM +0000, Andrew M.A. Cater wrote: > Debian Wiki on licence texts - OpenPBS and Torque both non-free. > (Though this may not be authoritative, it is certainly persuasive > evidence.) On Thu, May 12, 2005 at 09:02:05AM +0200, Kenneth Geisshirt wrote: > > The Debian Cluster Components (DCC) project uses Torque as batch system. > IMHO it would be better to include DCC in Debian. > Ideally we'd have both PBS/Torque/maui packages and SGE packages to fullfill both types of scheduling needs, and hopefully get at least one into Debian/main. SGE's new license (SISSL) is claimed by Sun to be a certified Open Source License. I've read it and suspect that may be true, but it would need careful examination on debian-legal (30K in HTML). Alternatives or conflicts would both be adequate solutions to sharing the names - it doesn't really make sense to put multiple queueing systems on a compute node. DQS successfully held off quakestat from taking the qstat name at least. :) -Drake

Previous Message by Thread: click to view message preview

The best you may make for is to be the #1 lover.

The best you may make for is to be the #1 lover. http://digest.longpleasure.info/?devilsxtvuydoomzvtimpetuously Chose place and time. It will do the rest.

Next Message by Thread: click to view message preview

Looking for cheap high-quality software?

Get a head start on a new computer career http://kpdy.ls7io23w0d3s04l.epoptjgifd.com
Sign up for updates to this mailing list. email:
Loading Comments...
Home | News | Patents | Sitemap | FAQ | advertise

Advertising by