On Thu, 2005-02-03 08:58:36 -0500, David Roundy <droundy@xxxxxxxxxxxxxxx>
wrote in message <20050203135830.GC21313@xxxxxxxxxxxxxxx>:
> I suspect that for the "fast patches" (which is what I'm most keen on
> optimizing) most of the time is spent "statting" and traversing the
> directory tree. This is an area where there are (I think) some low-lying
> fruit as far as optimizations go. They won't change the scaling, but if we
> can speed things up by an order of magnitude, that wouldn't be bad.
> We stat each file a number of times, once to see if it's a file or
> directory, once to see if it's actually a symlink, once to find the file
> modification time, once to find its length, etc. Removing these redundant
> stats would be great. I'd like to do this by writing a wrapper routine in
> C that calls stat and extracts the relevant parts. Then we (read "someone
> other than me") can write an analagous routine for Windows, and we can
> eliminate the current hack of providing a "System.Posix" module for
> windows, which is pretty ugly. The haskell code would need to be
> refactored a bit so it calls this stat wrapper just once, but that
> shouldn't be hard.
Well, there are basically two approaches to this:
- Rework the code to basically ask for all relevant data within one stat
- Optimize darcs while doing those check to simply omit the unneeded
ones
- Wrap the stat call and cache the result, with the possibility to
return stale data (though, I think this is more of a theoretical
problem for darcs).
I can't really work on the Haskell code, but I'd offer to...
> I'd also like to read the directory contents in C... in fact ultimately I'd
> like to move all the IO (except to stdout, etc) to C. That way we can keep
> the file names as C strings and never convert them to haskell
> strings--which may be a large portion of the current pain. File paths are
> currently encapsulated in FileName, so it'll be pretty easy to switch to a
> CString (well, really PackedString) representation... I've done it before
> as an experimental optimization (saving space) which turned out to hurt
> more than it gained.
...write in C whatever you'd like me to write :-)
> So for the fast cases... if we define N as the total number of files and M
> as the number of modified files, where N >> M, we should only need to
> optimize the directory reading (stat and reading directory listings) code.
> This also will speed up certain options like record --list-options, which
> is used by the bash completion code, and therefore *ought* to be lightning
> fast.
>
> In case you're wondering, I've gone into this much detail partly because
> this looks like an area where you'd be particularly effective as a
> contributor...
Thanks:-) As I said, just tell me what you'd like to see and I'll do it
for you. Though, I'll not yet touch Haskell code (maybe except the
really trivial things that seem obvious to me).
MfG, JBG
--
Jan-Benedict Glaw jbglaw@xxxxxxxxxx . +49-172-7608481 _ O _
"Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg _ _ O
fuer einen Freien Staat voll Freier Bürger" | im Internet! | im Irak! O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));
signature.asc
Description: Digital signature
_______________________________________________
darcs-users mailing list
darcs-users@xxxxxxxxx
http://www.abridgegame.org/mailman/listinfo/darcs-users
|