|
|
Subject: You need a new watch - msg#00023
List: linux.debian.devel.beowulf
Was this page helpful?
Thread at a glance:
Previous Message by Date:
click to view message preview
Re: distributed batch processing
On Thu, May 12, 2005 at 01:46:24AM +0000, Andrew M.A. Cater wrote:
Maybe, you want to check OAR (http://oar.imag.fr/).
OAR is a resource manager or (batch scheduler) for large
clusters. It's an alternative to PBS(+MAUI) , PBSpro, LSF, CCS or
Condor. It's suitable for productive plateforms and research
experiments.
It's widely used, here in France.
Gilles
> On Wed, May 11, 2005 at 06:12:20PM -0700, Dale Southard wrote:
> >
> > On May 11, 2005, at 4:03 PM, Drake Diedrich wrote:
> >
> > >Debian really ought to have at least one packaged
> > >implementation of this (DQS was, but I couldn't keep it working).
> >
> > As Josh mentioned, the Torque fork of OpenPBS is probably the
> > best bet. Torque is OpenPBS + FOSS extensions, including
> > re-implementation of PBSPro features. There are already some
> > debs available and there's at least one script floating around
> > to build deb packages.
> >
> > It's probably also worth looking at the maui scheduler (which
> > runs on top of torque or other batch systems). Someone wiser
> > than I should probably browse the licenses at
> > www.clusterresources.com to verify that torque and/or maui
> > meet the Debian standards before any work is started.
> >
> Debian Wiki on licence texts - OpenPBS and Torque both non-free.
> (Though this may not be authoritative, it is certainly persuasive
> evidence.)
>
> As someone who has actually downloaded OpenPBS from Altair for
> business use - it was a nightmare and required pre-registration.
> If you go there now, you are asked to fill out a webform - which
> still recommends strongly that you get PBSPro: you may even have
> to sign up for an evaluation licence for PBSPro in order to be
> redirected to the OpenPBS download location.
>
> Torque is certainly more freely available - and has been packaged as a
> .deb - but is not part of the Debian archive. Google for torque and .deb
> and you should find it. The only reason it is non-free is because the
> licence is a) controversial and b) unclear as to its effect. [Torque
> took an earlier version of OpenPBS which appeared to be more liberal in
> its licence terms and forked from there: it is unclear which licence
> terms still apply - see the thread on debian-legal cross referenced from
> the Wiki entry.
>
> It's a shame: I'm almost tempted to write a letter to Altair to suggest
> that they should open source OpenPBS by removing the license restriction
> and then dual-license PBSPro such that commercial customers can buy
> support (in the same way that Aladdin did with Ghostscript et. al.).
>
> I'd also volunteer to package Torque if it were feasible to have it in
> the distribution proper.
>
> Andy
> Andy
>
>
--
--
Gilles Fedak Researcher INRIA-Futurs
http://XtremWeb.net http://www.lri.fr/~fedak
Next Message by Date:
click to view message preview
Re: distributed batch processing
On Wed, May 11, 2005 at 06:12:20PM -0700, Dale Southard wrote:
>
> It's probably also worth looking at the maui scheduler (which
> runs on top of torque or other batch systems). Someone wiser
> than I should probably browse the licenses at
> www.clusterresources.com to verify that torque and/or maui
> meet the Debian standards before any work is started.
>
I've heard rumors of people plugging maui into SGE as well, but have
never seen a writeup of it. My guess is it's not worth the work versus
switching to PBS* if you really need maui. As I understand maui, it's best
when you have large parallel jobs (say 128 nodes of a 512 node cluster),
which it handles by reserving those blocks of nodes for the maximum expected
runtime. To do that though there have to be time limits on all jobs, else
maui schedules things infinitely far away. If jobs finish early, or there
are gaps between scheduled runs, smaller jobs can backfill the schedule to
increase utilization. Jobs that don't finish on time get killed.
It's always been a complex solution for a problem I've never actually
faced. Few people (3, only 1 actually running) in my cluster are writing
parallel code at all, most don't know or don't care how long their programs
will run (Monte Carlo and statistics tend to be like that). They just have
lots of independent serial jobs to run.
In contrast, SGE has a much simpler pair of schedulers (FIFO and
user-sort from DQS). It also has subordinate queues, which work really well
for suspending non-owner jobs (if parts of the cluster "belong" to certain
users) and for long jobs (suspend whenever something else wants to run) and
test/interactive jobs (higher priority, suspend everything lower, very
limited runtimes). A stack of low priority serial jobs make getting 100%
utilization easy. Subordinating part of a parallel job tends to be very
non-productive though (making the rest of the job wait), so parallel jobs
need to run in the higher priority queues to avoid idling large parts of the
cluster.
On Thu, May 12, 2005 at 01:46:24AM +0000, Andrew M.A. Cater wrote:
> Debian Wiki on licence texts - OpenPBS and Torque both non-free.
> (Though this may not be authoritative, it is certainly persuasive
> evidence.)
On Thu, May 12, 2005 at 09:02:05AM +0200, Kenneth Geisshirt wrote:
>
> The Debian Cluster Components (DCC) project uses Torque as batch system.
> IMHO it would be better to include DCC in Debian.
>
Ideally we'd have both PBS/Torque/maui packages and SGE packages to
fullfill both types of scheduling needs, and hopefully get at least one into
Debian/main. SGE's new license (SISSL) is claimed by Sun to be a certified
Open Source License. I've read it and suspect that may be true, but it
would need careful examination on debian-legal (30K in HTML). Alternatives
or conflicts would both be adequate solutions to sharing the names - it
doesn't really make sense to put multiple queueing systems on a compute
node. DQS successfully held off quakestat from taking the qstat name at
least. :)
-Drake
Previous Message by Thread:
click to view message preview
The best you may make for is to be the #1 lover.
The best you may make for is to be the #1 lover.
http://digest.longpleasure.info/?devilsxtvuydoomzvtimpetuously
Chose place and time. It will do the rest.
|
|