logo       

Re: Why we need delayed allocation!: msg#00045

file-systems.ext2.devel

Subject: Re: Why we need delayed allocation!

On Dec 26, 2003 14:17 -0500, Theodore Ts'o wrote:
> One of the reasons why I created the filefrag program is that I'm
> interested in seeing if how bad the filesystem fragmentation problem
> really is under ext2/3, and to see what the causes might be. The
> linker/BFD library just happened to be the first one I found. The
> $64,000 question is whether there might be some other more common-case
> scenarios where the userspace write ordering may be impacting
> filesystem performance in a measurable and significant way.

For sure - we've seen this in spades with Lustre. You have multiple
processes writing to the same filesystem at the same time and you get
terrible fragmentation with ext3. With ext2 at least one would get
8-block chunks because of prealloc, but even that doesn't really cut it.
The write impact isn't so bad because you get write merging in the block
layer, but at read time it goes catatonic (i.e. a 100MB/s disk array
drops to 4MB/s because of the fragmentation). There is a 2.6 heuristic
to colour the allocations based on pid, but that wouldn't help Lustre
(or NFS or Samba) where the pid of the service thread is randomly chosen
from a pool for each RPC.

What we did with Lustre to significant benefit is to grab a global semaphore
in the Lustre code during the block allocation stage so that we would get
a full RPC of contiguous allocation (up to 512KB) and then drop the lock
when we were doing the writes. This isn't as bad as it sounds, because
we still get parallelism during the RPC and during the disk IO, and only
single-thread during allocation although it is sometimes slow because of
read-behind-write and seeking from the other IOs.

To implement this properly at the ext3 level would need per-group read/write
locks or the hashed group locks used currently in 2.6 to protect allocations
between threads in the same group, which is just the opposite of the current
parallel allocation code.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills. Sign up for IBM's
Free Linux Tutorials. Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise