|
Re: Disk cache: msg#00014network.bit-torrent.libtorrent
Arvid Norberg wrote: > > On Apr 14, 2006, at 01:53, Radu Hociung wrote: > >> Greetings all, >> >> I have done some experiments on libtorrent and Transmission, comparing >> the effectiveness of their client-side caching and download scheduling >> algorithms. >> >> I put the results (with graphs) on Wikipedia at: >> >> http://en.wikipedia.org/wiki/BitTorrent_performance >> >> I hope that you'll find it a worthwhile read. > > Very nice graphs :) > > Caching blocks and only write whole pieces at once would definitely > improve that measurement. But I'm not sure it would improve in reality > (it probably would, a bit at least), since the OS's cache very likely is > done in the kernel, which means that you don't know how many of those > write-calls actually resulting in a write to the disk. I doubt that writing whole pieces would make much difference. Looking at libtorrent's current download pattern, and at the strace, it looks like it writes the blocks of a piece in the proper order. So they probably get assembled into the respective pieces in the OS's cache. If they were being downloaded out of order, I would see a case for caching whole pieces. However, you're right, about not knowing when the writes to disk actually occur: Say that the kernel starts off with an empty cache and 100MB free RAM. When downloading a 5GB file, after the 1st 100MB are received, the OS will have to start writing. It's reasonable that it will start flushing the oldest cache slots first, so it is likely that after the fist 100MB received, it will start writing the data it received first. In the case of some advanced filesystems (such as XFS, which buffers data as long as possible and allocates storage at the last possible moment, when flushing buffers becomes innevitable), it is possible that the cache won't always flush the first-received data, but it may try to group buffers based on their relative closeness. But in any case, when the file size is much larger than the available cache memory, it is inevitable that the disk heads will jump around. In my test setup, I only had 1 seed and 1 client, so to the client, all the pieces it didn't already have were equally rare in the 'swarm', and the random download pattern really did not have any purpose. I was thinking of another way of scheduling downloads. With 100MB of cache, the client could "focus" on downloading 100MB chunks of the torrent, and select the rarest first within that 100MB chunk before moving on to another chunk. the chunk could be a sliding window or such. Before moving the sliding window, the client could do a 'sync' to force data to be flushed. I would assume that on sync, the data would be written in the most contiguous way possible. I plan on experimenting with the scheduling routines of either libtorrent or transmission to test out the various hypotheses. The more I play around, the more I convince myself that it's the scheduling algorithm that can bring the most benefits, not the caching algorithm. After all, in a case where most pieces are written to the disk once, and never read back, the cache acts as little more than a simple delayed-write buffer, whether it is an application cache or an OS cache. BTW: Why are there 2 duplicate seeks before every write and 3 before reads? Here's an example: 21561 write(6, /* head at 4161536 effectiveness 61.077 */0x8083e31, 16384) = 16384 21561 _llseek(6, 4177920, [4177920], /* BT efficiency 61.077% */ SEEK_SET) = 0 21561 _llseek(6, 0, [4177920], /* BT efficiency 61.077% */ SEEK_CUR) = 0 21561 write(6, /* head at 4177920 effectiveness 61.111 */0x8083e31, 16384) = 16384 21561 _llseek(6, 3145728, [3145728], /* BT efficiency 61.111% */ SEEK_SET) = 0 21561 _llseek(6, 0, [3145728], /* BT efficiency 61.111% */ SEEK_CUR) = 0 21561 _llseek(6, 0, [3145728], /* BT efficiency 61.111% */ SEEK_CUR) = 0 21561 read(6, /* head at 4194304 effectiveness 60.000 */0x404f4008, 1048576) = 1048576 21561 _llseek(6, 4194304, [4194304], /* BT efficiency 60.000% */ SEEK_SET) = 0 21561 _llseek(6, 0, [4194304], /* BT efficiency 60.000% */ SEEK_CUR) = 0 21561 write(6, /* head at 4194304 effectiveness 61.905 */0x404f4008, 1048576) = 1048576 Regards, Radu. ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Disk cache: 00014, Arvid Norberg |
|---|---|
| Next by Date: | Re: Disk cache: 00014, MooPolice |
| Previous by Thread: | Re: Disk cachei: 00014, Arvid Norberg |
| Next by Thread: | Re: Disk cache: 00014, Cory Nelson |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |