logo       

Some design questions: msg#00020

network.bit-torrent.libtorrent

Subject: Some design questions

Hello Arvid and all,


While studying libtorrent and transmission looking for caching
opportunities, I have a couple of questions, where I can't easily find
the answer in the code:


1. How is the SHA1 hash of an incoming piece calculated if the blocks
arrive out of order? Are the pieces read back from the disk when the
piece is complete? Or can sha1 context be saved even though the blocks
arrive out of order?

2. Allocating disk space. When allocating disk space, libtorrent writes
a block of {piece_size} zero's to the disk. In Unix and Windows, this is
unnecessary. fseek and fwrite of the real data in the final resting
place is all that's needed; the OS allows seeking past the end of the
file (in write mode at least), and creates sparse files when there are
holes. I believe this is the way all OS's are intended to work, but I
don't actually know if it is by POSIX requirement, convention or
established practice.

In the code, there's a suggestions that not all OS's behave this way.
Are there definite examples?

In any case, wouldn't it be better to code a conditional #define where
the space is allocated by writing zeros only on OS's where this is
necessary (which I believe to be the minority), rather than incur the
penalty on all OS's ? Are there any applications using libtorrent on
those OS's?

The following snipped (seektest.c) can be illustrate:

#include <stdio.h>

main()
{
fseek(stdout, 1024*1024, SEEK_SET);
putchar('$');
}

compile and run as "seektest > testfile; ls -ls testfile". This will
show the file size, as well as the amount of disk space currently
allocated for the file.


3. On a local test copy, I removed the random_shuffle(pieces) calls, in
effect making the downloads sequential. However, I am seeing behaviour I
cannot explain:

After downloading 1/2 of the file in sequence, the client_test app
starts a download pattern like p-1,p+1,p-2,p+2,p-3,p+3, etc (where p is
the middle point of the file). However, reads from the file are never
done from the second half of the file. A picture is worth a thousand words:

http://www.ohmi.org/~radu/torrentwork/sparse.png

In the first half of the file, the writes are 16K each, and reads are
1M, while in the second half of the file, writes are 1M each, and there
are no reads. My theory is that during the 1st half of the file, blocks
are written, and pieces are read back to calculate hashes, while in the
second half, entire pieces are written, after calculating the hashes,
such that reads are not necessary.

Interestingly, in in the 1st half of the file, all the blocks are
written to disk in order, with no other data interleaved, so the library
could as well buffer the entire piece in memory before writing to the cache.

Hoever, what I cannot explain is why the download behaviour changes at
half-full. Why could this be, and where is the code responsible?

Perhaps the number of cached blocks is variable, and is increased such
that by the second half, it becomes big enough to store an entire piece?


Thanks,
Radu.





-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise