|
Re: Why we need delayed allocation!: msg#00045file-systems.ext2.devel
On Dec 26, 2003 14:17 -0500, Theodore Ts'o wrote: > One of the reasons why I created the filefrag program is that I'm > interested in seeing if how bad the filesystem fragmentation problem > really is under ext2/3, and to see what the causes might be. The > linker/BFD library just happened to be the first one I found. The > $64,000 question is whether there might be some other more common-case > scenarios where the userspace write ordering may be impacting > filesystem performance in a measurable and significant way. For sure - we've seen this in spades with Lustre. You have multiple processes writing to the same filesystem at the same time and you get terrible fragmentation with ext3. With ext2 at least one would get 8-block chunks because of prealloc, but even that doesn't really cut it. The write impact isn't so bad because you get write merging in the block layer, but at read time it goes catatonic (i.e. a 100MB/s disk array drops to 4MB/s because of the fragmentation). There is a 2.6 heuristic to colour the allocations based on pid, but that wouldn't help Lustre (or NFS or Samba) where the pid of the service thread is randomly chosen from a pool for each RPC. What we did with Lustre to significant benefit is to grab a global semaphore in the Lustre code during the block allocation stage so that we would get a full RPC of contiguous allocation (up to 512KB) and then drop the lock when we were doing the writes. This isn't as bad as it sounds, because we still get parallelism during the RPC and during the disk IO, and only single-thread during allocation although it is sometimes slow because of read-behind-write and seeking from the other IOs. To implement this properly at the ext3 level would need per-group read/write locks or the hashed group locks used currently in 2.6 to protect allocations between threads in the same group, which is just the opposite of the current parallel allocation code. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Why we need delayed allocation!: 00045, Theodore Ts'o |
|---|---|
| Next by Date: | Re: Why we need delayed allocation!: 00045, Arjan van de Ven |
| Previous by Thread: | Re: Why we need delayed allocation!i: 00045, Theodore Ts'o |
| Next by Thread: | Re: Why we need delayed allocation!: 00045, Mike Fedyk |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |