|
Re: Disk cache: msg#00016network.bit-torrent.libtorrent
On 4/13/06, Radu Hociung <radu.libtorrent@xxxxxxxx> wrote: > Arvid Norberg wrote: > > > > On Apr 14, 2006, at 01:53, Radu Hociung wrote: > > > >> Greetings all, > >> > >> I have done some experiments on libtorrent and Transmission, comparing > >> the effectiveness of their client-side caching and download scheduling > >> algorithms. > >> > >> I put the results (with graphs) on Wikipedia at: > >> > >> http://en.wikipedia.org/wiki/BitTorrent_performance > >> > >> I hope that you'll find it a worthwhile read. > > > > Very nice graphs :) > > > > Caching blocks and only write whole pieces at once would definitely > > improve that measurement. But I'm not sure it would improve in reality > > (it probably would, a bit at least), since the OS's cache very likely is > > done in the kernel, which means that you don't know how many of those > > write-calls actually resulting in a write to the disk. > > I doubt that writing whole pieces would make much difference. Looking at > libtorrent's current download pattern, and at the strace, it looks like > it writes the blocks of a piece in the proper order. So they probably > get assembled into the respective pieces in the OS's cache. If they were > being downloaded out of order, I would see a case for caching whole pieces. > > However, you're right, about not knowing when the writes to disk > actually occur: > > Say that the kernel starts off with an empty cache and 100MB free RAM. > When downloading a 5GB file, after the 1st 100MB are received, the OS > will have to start writing. It's reasonable that it will start flushing > the oldest cache slots first, so it is likely that after the fist 100MB > received, it will start writing the data it received first. > > In the case of some advanced filesystems (such as XFS, which buffers > data as long as possible and allocates storage at the last possible > moment, when flushing buffers becomes innevitable), it is possible that > the cache won't always flush the first-received data, but it may try to > group buffers based on their relative closeness. > > > But in any case, when the file size is much larger than the available > cache memory, it is inevitable that the disk heads will jump around. > > In my test setup, I only had 1 seed and 1 client, so to the client, all > the pieces it didn't already have were equally rare in the 'swarm', and > the random download pattern really did not have any purpose. > > > I was thinking of another way of scheduling downloads. With 100MB of > cache, the client could "focus" on downloading 100MB chunks of the > torrent, and select the rarest first within that 100MB chunk before > moving on to another chunk. the chunk could be a sliding window or such. > Before moving the sliding window, the client could do a 'sync' to force > data to be flushed. I would assume that on sync, the data would be > written in the most contiguous way possible. > > I plan on experimenting with the scheduling routines of either > libtorrent or transmission to test out the various hypotheses. > > The more I play around, the more I convince myself that it's the > scheduling algorithm that can bring the most benefits, not the caching > algorithm. After all, in a case where most pieces are written to the > disk once, and never read back, the cache acts as little more than a > simple delayed-write buffer, whether it is an application cache or an OS > cache. > > BTW: Why are there 2 duplicate seeks before every write and 3 before > reads? Here's an example: > > 21561 write(6, /* head at 4161536 effectiveness 61.077 */0x8083e31, > 16384) = 16384 > 21561 _llseek(6, 4177920, [4177920], /* BT efficiency 61.077% */ > SEEK_SET) = 0 > 21561 _llseek(6, 0, [4177920], /* BT efficiency 61.077% */ SEEK_CUR) = 0 > 21561 write(6, /* head at 4177920 effectiveness 61.111 */0x8083e31, > 16384) = 16384 > 21561 _llseek(6, 3145728, [3145728], /* BT efficiency 61.111% */ > SEEK_SET) = 0 > 21561 _llseek(6, 0, [3145728], /* BT efficiency 61.111% */ SEEK_CUR) = 0 > 21561 _llseek(6, 0, [3145728], /* BT efficiency 61.111% */ SEEK_CUR) = 0 > 21561 read(6, /* head at 4194304 effectiveness 60.000 */0x404f4008, > 1048576) = 1048576 > 21561 _llseek(6, 4194304, [4194304], /* BT efficiency 60.000% */ > SEEK_SET) = 0 > 21561 _llseek(6, 0, [4194304], /* BT efficiency 60.000% */ SEEK_CUR) = 0 > 21561 write(6, /* head at 4194304 effectiveness 61.905 */0x404f4008, > 1048576) = 1048576 > > > > Regards, > Radu. Windows has a way to control how much the OS should cache. This amount has a very conservative default on client versions - I'm curious to see how good an application-level cache would be if the OS-level cache was bumped up. Having libtorrent use gobs of memory in a useless attempt to out-cache the kernel is certainly undesirable. I imagine all modern operating systems optimize outstanding asynchronous disk i/o calls to be completed with the least amount of seeking - something which libtorrent doesn't use last I checked but could certainly benefit from even if only for the cpu usage benefits. I'm also curious as to how much app level cache really benefits the user. Either you have a low amount of traffic that can be easily cached by the OS or you have a high amount of traffic which neither cache can keep up with without using a lot of memory. In the worst case an application-level cache might be put into the swap file and cause even more disk thrashing, but this would undoubtedly never happen with an OS cache. IMHO this definately needs more real world testing! -- Cory Nelson http://www.int64.org ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Disk cache: 00016, MooPolice |
|---|---|
| Next by Date: | Re: Disk cache: 00016, Arvid Norberg |
| Previous by Thread: | Re: Disk cachei: 00016, Radu Hociung |
| Next by Thread: | Re: Disk cache: 00016, Arvid Norberg |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |