logo       

Re: Why we need delayed allocation!: msg#00048

file-systems.ext2.devel

Subject: Re: Why we need delayed allocation!


1) Avoid it. Which, as you point out, is important for extends
to work really nice.

Agreed. FFS, for instance, always keeps 64K "clusters" of blocks contiguous on disk, even if it means reallocating the entire cluster on a write. Howevewr, beyond the obvious drawbacks for synchronous writes and large writes, this reallocation, while keeping a single file contiguous, may eliminate the spacial locality of related files (e.g. a group of configuration files, etc.).

2) Deal with it smart(ly).

I worked on a project called Virtual Contiguity, that allowed for small blocks (1K or 4K) to be allocated non-contiguously in a larger, logical region on disk (256K). We allocated blocks as close as possible to the original allocation, but did not enforce any policy on strict contiguity. Reads were then performed on these 256K regions (emulating the benefits of a large block) and the small blocks were filtered out of the track cache for good memory performance (4k blocks means 4k pages.) Deallocating blocks leaves room for files to grow "in-place" or related files to be placed locally.

-Zachary

On Dec 26, 2003, at 2:08 PM, Arjan van de Ven wrote:

On Fri, 2003-12-26 at 20:17, Theodore Ts'o wrote:

It might be interesting to see if this does make a difference for very
large executables, such as evolution or mozilla, which will not fit
inside a track cache.

from what I've seen it doesn't; what kills these is more the link time
and the fact that each opens a few gazilion small files (.so's and icons
and stuff) during startup (which all leads to atime updates etc).

There's two ways of approaching fragmentation:
1) Avoid it. Which, as you point out, is important for extends
to work really nice.
The most extreme form of this is an in-kernel-online-defragger
which would remap blocks of files which are very fragmented into
a new contiguous area on the fly. This is a bit like the garbage
collector in jffs2 and could use a "not fragmented" bit in the inode.
Obviously this is considered bloat once mentioned out loud...
Also the interactions with O_DIRECT and such a beast are beyond
funny.
Controlling fragmentation at the source can help, esp if the writes
from userspace are nicely sequential (eg tar/cpio/rpm/whatever) and
the filesize is known beforehand.
2) Deal with it smart.
For in-the-file ordering issues, in theory a physical readahead based
thing would do wonders. Physical readahead has it's own downsides (eg
unlike file position readahead there is less guaranteed real usage of
the data you just read) but the actual IO itself is near free
anyway so it may in the end be the right thing.

There have been experiments with making ld.so call sys_readahead() on
the portions of binaries/libraries that it will actually use (which
in effect will read the entire logical file which also means the
entire physical block I suspect).
There may well be other similar things that can be done with the 2.6
BIO layer (like including dummy reads of blocks that would break a
big sequential read into smaller ones).


=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Zachary Peterson zachary@xxxxxxx
http://znjp.com



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills. Sign up for IBM's
Free Linux Tutorials. Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise