|
|
Re: Some design questions: msg#00049
network.bit-torrent.libtorrent
|
Subject: |
Re: Some design questions |
On Apr 19, 2006, at 06:13, Radu Hociung wrote:
Arvid Norberg wrote:
On Sat, Apr 15, 2006 at 11:40:03AM -0400, Radu Hociung wrote:
Hello Arvid and all,
While studying libtorrent and transmission looking for caching
opportunities, I have a couple of questions, where I can't easily
find
the answer in the code:
1. How is the SHA1 hash of an incoming piece calculated if the
blocks
arrive out of order? Are the pieces read back from the disk when the
piece is complete? Or can sha1 context be saved even though the
blocks
arrive out of order?
Each block is written to disk when it arrives. When all blocks
have been written, the entire piece is read back and
checked. Most of the time the last read is just reading from the
cache, and in the cases when it's not, the download was
probably too slow to keep it in the cache.
2. Allocating disk space. When allocating disk space, libtorrent
writes
a block of {piece_size} zero's to the disk. In Unix and Windows,
this is
unnecessary. fseek and fwrite of the real data in the final resting
place is all that's needed; the OS allows seeking past the end of
the
file (in write mode at least), and creates sparse files when
there are
holes. I believe this is the way all OS's are intended to work,
but I
don't actually know if it is by POSIX requirement, convention or
established practice.
I should probably have written more precisely which problems and
which OS'es I found problems with. But to be honest I'm
not sure now, but I can imagine that windows 95/98/Me can't handle
sparse files.
I don't know what to imagine ;) Sparse files have to be supported by
both filesystem and OS. FAT filesystems do not support sparse files
even
under Linux, though I am pretty sure that Linux fills the blanks with
0's itself, as needed, even on FAT.
There may be ways to query the target OS from the jam/autoconf
utilities
to find out if seeking past end of file works correctly, and
setting up
the build accordingly?
In the code, there's a suggestions that not all OS's behave this
way.
Are there definite examples?
In any case, wouldn't it be better to code a conditional #define
where
the space is allocated by writing zeros only on OS's where this is
necessary (which I believe to be the minority), rather than incur
the
penalty on all OS's ? Are there any applications using libtorrent on
those OS's?
I think so. Did you try to remove the code that writes zeroes and
see if it still works?
Yes, I tried this. The client test works correctly under linux; I have
not tested other OS's.
The following snipped (seektest.c) can be illustrate:
#include <stdio.h>
main()
{
fseek(stdout, 1024*1024, SEEK_SET);
putchar('$');
}
compile and run as "seektest > testfile; ls -ls testfile". This will
show the file size, as well as the amount of disk space currently
allocated for the file.
3. On a local test copy, I removed the random_shuffle(pieces)
calls, in
effect making the downloads sequential. However, I am seeing
behaviour I
cannot explain:
After downloading 1/2 of the file in sequence, the client_test app
starts a download pattern like p-1,p+1,p-2,p+2,p-3,p+3, etc
(where p is
the middle point of the file). However, reads from the file are
never
done from the second half of the file. A picture is worth a
thousand words:
http://www.ohmi.org/~radu/torrentwork/sparse.png
In the first half of the file, the writes are 16K each, and reads
are
1M, while in the second half of the file, writes are 1M each, and
there
are no reads. My theory is that during the 1st half of the file,
blocks
are written, and pieces are read back to calculate hashes, while
in the
second half, entire pieces are written, after calculating the
hashes,
such that reads are not necessary.
Interestingly, in in the 1st half of the file, all the blocks are
written to disk in order, with no other data interleaved, so the
library
could as well buffer the entire piece in memory before writing to
the cache.
Hoever, what I cannot explain is why the download behaviour
changes at
half-full. Why could this be, and where is the code responsible?
Did you have just one peer in this test as well?
That's right. the testbench I'm working on is not ready for
multiple peers.
The thing is that there's no code that tries to keep the pieces
ordered in the piece picker (for obvious reasons), so
I'm not surprised that just removing the random_shuffle() didn't
make it completely ordered.
The piece picker works like this:
It has a vector of vectors. Each vector contains all the piece-
indices that has the corresponding number of peers. When
a peer is joining, some pieces will be promoted to have one more
peers, and moved from one vector to another. To save
time (this is an operation that is quite common, since there
usually are quite alot of pieces), the piece index that is
removed is replaced by the index at the end, and when it is
inserted into the other vector, it is appended at the end.
This makes the operation O(1). That is probably the reason why the
pieces aren't ordered.
I see. Storing the indeces in a B-tree would still allow an O(log2(N))
complexity, no? How much CPU utilization are we talking about? Your
index storage could then be a vector of BTrees.
Perhaps the number of cached blocks is variable, and is increased
such
that by the second half, it becomes big enough to store an entire
piece?
Right now there is no cache.
That's what I thought too, which is why I was puzzled by the behaviour
of the test client. After your explanation, I was compelled to have a
closer look:
27899 write(4, 0x8081fa9, 16384) = 16384
27899 _llseek(4, 1032192, [1032192], SEEK_SET) = 0
27899 _llseek(4, 0, [1032192], SEEK_CUR) = 0
27899 write(4, 0x8081fa9, 16384) = 16384
27899 _llseek(4, 0, [0], SEEK_SET) = 0
27899 _llseek(4, 0, [0], SEEK_CUR) = 0
27899 _llseek(4, 0, [0], SEEK_CUR) = 0
27899 read(4, 0x41b63008, 1048576) = 1048576
27899 _llseek(4, 1048576, [1048576], SEEK_SET) = 0
27899 _llseek(4, 0, [1048576], SEEK_CUR) = 0
^^^^^^^
27899 write(4, 0x41b63008, 1048576) = 1048576
^^^^^^^^^^ ^^^^^^^
27899 _llseek(4, 1048576, [1048576], SEEK_SET) = 0
27899 _llseek(4, 0, [1048576], SEEK_CUR) = 0
27899 write(4, 0x8081fa9, 16384) = 16384
27899 _llseek(4, 1064960, [1064960], SEEK_SET) = 0
27899 _llseek(4, 0, [1064960], SEEK_CUR) = 0
27899 write(4, 0x8081fa9, 16384) = 16384
27899 _llseek(4, 1081344, [1081344], SEEK_SET) = 0
27899 _llseek(4, 0, [1081344], SEEK_CUR) = 0
27899 write(4, 0x8081fa9, 16384) = 16384
The app has therefore two file buffers (0x8081fa9, 16K and
0x41b63008, 1M)
The second call to seek is just to make sure the first operation
succeeded, since seek will return the current file position. i.e. seek
(0, SEEK_CUR) is the implementation of tell() in the file abstraction.
An idea I had, was that pieces with more than a certain number of
peers could be ordered. That would have the effect
that "good" torrents would be downloaded sequencially. I'm
planning on implementing this when I come back from the
holiday (and also reply to your other mails). What do you think of
that?
It's a good idea, and your implementation can support this change with
little code change:
In your vector of Btrees, instead of using a peer count in the
vector's
index, you could use a function f(supply/demand). The function f would
make the rational number "supply/demand" into an integer useable as an
index:
say that f(x) = log2(x*10)
Then, if supply=1, demand>=10 (very rare piece), the Btree would be
stored in the vector at location 0
If supply = 5, demand = 5 (not a rare piece), the Btree would be
stored
in the vector at location 3
If supply = 20, demain = 5 (plentiful supply), the Btree would be
stored
at location 6
Then, you'd be downloading ring (vector location) 0 first, followed by
ring 1, then 2, and so on. You can have a different scheduling policy
for each ring (e.g. ring 0 downloaded randomly across the entire file,
rings 1-3 randomly across sections of the file, and rings 4+
sequentially). I believe a random schedule does not make much sense
for
any of the rings, but of course it's up to you.
The function I describe is only an idea, you may want to do some
arithmetic to find a similar function that makes more sense for your
implementation. The important thing is to use a supply-demand
metric for
decision making, not only supply as you do currently.
You mean to also count the number of peers that don't have the piece?
Right now I've just made a limit, any peer count >= this limit are
put in the same vector (as if they were equaly rare) and they are
also sorted. In a well seeded torrent, the pieces will be downloaded
in sequence.
I'll let you know once I've checked it in and maybe you could check
it out.
--
Arvid Norberg
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
|
|