logo       

Re: Disk cache: msg#00014

network.bit-torrent.libtorrent

Subject: Re: Disk cache

Arvid Norberg wrote:
>
> On Apr 14, 2006, at 01:53, Radu Hociung wrote:
>
>> Greetings all,
>>
>> I have done some experiments on libtorrent and Transmission, comparing
>> the effectiveness of their client-side caching and download scheduling
>> algorithms.
>>
>> I put the results (with graphs) on Wikipedia at:
>>
>> http://en.wikipedia.org/wiki/BitTorrent_performance
>>
>> I hope that you'll find it a worthwhile read.
>
> Very nice graphs :)
>
> Caching blocks and only write whole pieces at once would definitely
> improve that measurement. But I'm not sure it would improve in reality
> (it probably would, a bit at least), since the OS's cache very likely is
> done in the kernel, which means that you don't know how many of those
> write-calls actually resulting in a write to the disk.

I doubt that writing whole pieces would make much difference. Looking at
libtorrent's current download pattern, and at the strace, it looks like
it writes the blocks of a piece in the proper order. So they probably
get assembled into the respective pieces in the OS's cache. If they were
being downloaded out of order, I would see a case for caching whole pieces.

However, you're right, about not knowing when the writes to disk
actually occur:

Say that the kernel starts off with an empty cache and 100MB free RAM.
When downloading a 5GB file, after the 1st 100MB are received, the OS
will have to start writing. It's reasonable that it will start flushing
the oldest cache slots first, so it is likely that after the fist 100MB
received, it will start writing the data it received first.

In the case of some advanced filesystems (such as XFS, which buffers
data as long as possible and allocates storage at the last possible
moment, when flushing buffers becomes innevitable), it is possible that
the cache won't always flush the first-received data, but it may try to
group buffers based on their relative closeness.


But in any case, when the file size is much larger than the available
cache memory, it is inevitable that the disk heads will jump around.

In my test setup, I only had 1 seed and 1 client, so to the client, all
the pieces it didn't already have were equally rare in the 'swarm', and
the random download pattern really did not have any purpose.


I was thinking of another way of scheduling downloads. With 100MB of
cache, the client could "focus" on downloading 100MB chunks of the
torrent, and select the rarest first within that 100MB chunk before
moving on to another chunk. the chunk could be a sliding window or such.
Before moving the sliding window, the client could do a 'sync' to force
data to be flushed. I would assume that on sync, the data would be
written in the most contiguous way possible.

I plan on experimenting with the scheduling routines of either
libtorrent or transmission to test out the various hypotheses.

The more I play around, the more I convince myself that it's the
scheduling algorithm that can bring the most benefits, not the caching
algorithm. After all, in a case where most pieces are written to the
disk once, and never read back, the cache acts as little more than a
simple delayed-write buffer, whether it is an application cache or an OS
cache.

BTW: Why are there 2 duplicate seeks before every write and 3 before
reads? Here's an example:

21561 write(6, /* head at 4161536 effectiveness 61.077 */0x8083e31,
16384) = 16384
21561 _llseek(6, 4177920, [4177920], /* BT efficiency 61.077% */
SEEK_SET) = 0
21561 _llseek(6, 0, [4177920], /* BT efficiency 61.077% */ SEEK_CUR) = 0
21561 write(6, /* head at 4177920 effectiveness 61.111 */0x8083e31,
16384) = 16384
21561 _llseek(6, 3145728, [3145728], /* BT efficiency 61.111% */
SEEK_SET) = 0
21561 _llseek(6, 0, [3145728], /* BT efficiency 61.111% */ SEEK_CUR) = 0
21561 _llseek(6, 0, [3145728], /* BT efficiency 61.111% */ SEEK_CUR) = 0
21561 read(6, /* head at 4194304 effectiveness 60.000 */0x404f4008,
1048576) = 1048576
21561 _llseek(6, 4194304, [4194304], /* BT efficiency 60.000% */
SEEK_SET) = 0
21561 _llseek(6, 0, [4194304], /* BT efficiency 60.000% */ SEEK_CUR) = 0
21561 write(6, /* head at 4194304 effectiveness 61.905 */0x404f4008,
1048576) = 1048576



Regards,
Radu.



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise