logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

Re: Problems with maui scalability: msg#00066

Subject: Re: Problems with maui scalability
I think we run jobs in the 900 node range with no problems daily although
our average size is smaller.  I haven't looked in the logs to see if we're
having communication problems that are masked by retries or something
though.

-- pete


On 9/12/07 12:14 AM, "Lennart Karlsson" <Lennart.Karlsson@xxxxxxxxxx> wrote:

> meo@xxxxxxxxxxxxxx said:
>> Peter Wyckoff said...
>> 
>> |I'm wondering how big you've gotten maui and torque to scale, mostly
>> |interested in number of nodes?
>> |
>> |The docs say something like 1,000 but I think it scales well beyond that,
>> |no?
>> 
>> That's what I've heard.  Right now we're at about 300 nodes.
> 
> Are you able to start a parallel job spanning all of these 300 nodes
> or is the mom-to-mom communication setup breaking down?
> 
> We have problems starting jobs wider than about 100 nodes, because
> that amount of moms gets difficulties synchronizing among themselves
> at startup.
> 
> -- Lennart Karlsson <Lennart.Karlsson@xxxxxxxxxx>
>    National Supercomputer Centre in Linkoping, Sweden
>    http://www.nsc.liu.se
> 
> 


<Prev in Thread] Current Thread [Next in Thread>