|
|
Re: Disk cache: msg#00024
network.bit-torrent.libtorrent
On Apr 14, 2006, at 17:05, Radu Hociung wrote:
Arvid Norberg wrote:
On Apr 14, 2006, at 06:27, Radu Hociung wrote:
I doubt that writing whole pieces would make much difference.
Looking at
libtorrent's current download pattern, and at the strace, it
looks like
it writes the blocks of a piece in the proper order. So they
probably
get assembled into the respective pieces in the OS's cache. If
they were
being downloaded out of order, I would see a case for caching whole
pieces.
I thought you downloaded a "wild" torrent, with more than one
peer. If
you had done that, you would have noticed that the write operations
would have been scattered more. Because libtorrent is writing each
block
to disk, and even though blocks within a piece are downloaded
sequentially, it would still be more seeking.
In high speeds libtorrent prefers to download whole pieces from
single
peers, which means block writes to different pieces would be
interleaved.
Ah, of course. I should have thought of this.
In order to answer questions regarding the behaviour of OS caches,
there
is a library available to study physical disk access, written at the
Renaissance Computing Institute. It requires a patch to the kernel, as
well as some calls in the user-level application to start/stop
tracing.
If the user-level calls were put in strace, then physical analysis
of an
application would be easier to apply to any client, without modifying
the client itself.
Here's the URL to the (open source) library:
http://www.renci.org/research/cadre/CADRETraceLibraries/
CADREPhysicalIOLibrary.htm
The testbench sw I am using is essentially strace, with
modifications I
made. I plan to release this testbench so that you can run it
yourselves
also, which I think will allow you to easily experiment and draw
conclusions, without requiring guessing.
Do you think it would be possible to use it on a clean strace log
(without the modifications?). I'm asking because in that case maybe
it would be possible to use it on a ktrace log (which is used on
darwin).
However, you're right, about not knowing when the writes to disk
actually occur:
Say that the kernel starts off with an empty cache and 100MB free
RAM.
When downloading a 5GB file, after the 1st 100MB are received,
the OS
will have to start writing. It's reasonable that it will start
flushing
the oldest cache slots first, so it is likely that after the fist
100MB
received, it will start writing the data it received first.
In the case of some advanced filesystems (such as XFS, which buffers
data as long as possible and allocates storage at the last possible
moment, when flushing buffers becomes innevitable), it is
possible that
the cache won't always flush the first-received data, but it may
try to
group buffers based on their relative closeness.
Maybe it's possible to get timestamps in the trace log. Then it
would be
possible to estimate when actual writes were done.
It would be very easy to add timestamps, you can use the same command
line as I show in wikipedia, but add "-tt" (timestamp) and/or -T (call
duration) to the strace options.
You can this way profile any torrent.
But in any case, when the file size is much larger than the
available
cache memory, it is inevitable that the disk heads will jump around.
In my test setup, I only had 1 seed and 1 client, so to the
client, all
the pieces it didn't already have were equally rare in the
'swarm', and
the random download pattern really did not have any purpose.
I was thinking of another way of scheduling downloads. With 100MB of
cache, the client could "focus" on downloading 100MB chunks of the
torrent, and select the rarest first within that 100MB chunk before
moving on to another chunk. the chunk could be a sliding window
or such.
Before moving the sliding window, the client could do a 'sync' to
force
data to be flushed. I would assume that on sync, the data would be
written in the most contiguous way possible.
I plan on experimenting with the scheduling routines of either
libtorrent or transmission to test out the various hypotheses.
The more I play around, the more I convince myself that it's the
scheduling algorithm that can bring the most benefits, not the
caching
I'm a bit reluctant to change the random distribution + rarest first
algorithm, because it is much more efficient in keeping a torrent
alive
than if chunks would be downloaded. (with alive I mean that the
distributed copies >= 1).
I understand, but in some cases the random distribution does not do
anything. In the case I show of 1:1, it's no safer to download data
out
of order than it is to download sequentially I think.
Not in the case where you're guaranteed that the seed will stay
online, but it's very uncommon to have such a guarantee. Usually, the
seed could go offline and a half-finished peer will come online. The
optimal case would be that both peers would have 1 distributed copy
and be able to complete the file. And closest one can get to that
(without having any global knowledge) is to download pieces in random.
Also, in the case of a perfect torrent (more seeds than
downloaders, say
20:5), say that two pieces have availability of 20 and 24
respectively.
Does it really make a difference if you download the more available
piece first? Is it likely that the 20 seeds will leave the torrent,
rendering it dead? The client should constantly evaluate risk, and
switch to "survival" mode only when necessary, but it's overkill to
always run in "survival" mode.
This is very true (as mentioned in another branch of this thread). I
don't think the ratio is that important though. As long as a piece
has enough peers, it could be downloaded sequentially.
algorithm. After all, in a case where most pieces are written to the
disk once, and never read back, the cache acts as little more than a
simple delayed-write buffer, whether it is an application cache
or an OS
cache.
That is why I haven't made any real effort in implementing a cache in
libtorrent so far.
BTW: Why are there 2 duplicate seeks before every write and 3 before
reads? Here's an example:
21561 write(6, /* head at 4161536 effectiveness 61.077 */0x8083e31,
16384) = 16384
21561 _llseek(6, 4177920, [4177920], /* BT efficiency 61.077% */
SEEK_SET) = 0
21561 _llseek(6, 0, [4177920], /* BT efficiency 61.077% */
SEEK_CUR) = 0
21561 write(6, /* head at 4177920 effectiveness 61.111 */0x8083e31,
16384) = 16384
21561 _llseek(6, 3145728, [3145728], /* BT efficiency 61.111% */
SEEK_SET) = 0
21561 _llseek(6, 0, [3145728], /* BT efficiency 61.111% */
SEEK_CUR) = 0
21561 _llseek(6, 0, [3145728], /* BT efficiency 61.111% */
SEEK_CUR) = 0
21561 read(6, /* head at 4194304 effectiveness 60.000 */0x404f4008,
1048576) = 1048576
21561 _llseek(6, 4194304, [4194304], /* BT efficiency 60.000% */
SEEK_SET) = 0
21561 _llseek(6, 0, [4194304], /* BT efficiency 60.000% */
SEEK_CUR) = 0
21561 write(6, /* head at 4194304 effectiveness 61.905 */0x404f4008,
1048576) = 1048576
Does this happen only with libtorrent, or with transmission too?
If it's
in both, I would guess that it's an implementation detail of the
POSIX
layer, that the POSIX calls don't translate directly to system calls.
Since libtorrent has a pool of open files, each read or write
operation
has to seek before reading/writing, because there may have been other
such operations since last time, which moved the file position. This
means that even when reading sequentially, libtorrent will seek to
the
current position because it doesn't know that the last read/write
operation left the file position at the place where it wants it
for the
next operation.
I can't imagine that there's any cost involved in doing that extra
seek,
the alternative would be to call tell() and the compare, and then
seek.
But the common case is that that seek is necessary anyway, it's just
that in your case you only had one peer.
This only happens in libtorrent; transmission seeks exactly once for
both reads and writes.
That is very strange, I'll have to investigate this.
So if you do the SEEK_CUR in order to ensure the cursor is where you
expect it, it means you don't have an atomic lock between the SEEK_SET
and the read/write? If you don't have such a lock and your app is
multi-threaded, what prevents another thread from seeking between your
SEEK_CUR and the read/write?
No, that's not why I do a seek before every read and write. All IO
operations are done in the same thread. But every read operation is
done without any context. The current file position is ignored. The
storage abstraction basically gives random access to the files. And
the queue for sending blocks may be interleaved with blocks from
different pieces.
--
Arvid Norberg
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
|
|