|
Re: Some design questions: msg#00021network.bit-torrent.libtorrent
On Sat, Apr 15, 2006 at 11:40:03AM -0400, Radu Hociung wrote: > Hello Arvid and all, > > > While studying libtorrent and transmission looking for caching > opportunities, I have a couple of questions, where I can't easily find > the answer in the code: > > > 1. How is the SHA1 hash of an incoming piece calculated if the blocks > arrive out of order? Are the pieces read back from the disk when the > piece is complete? Or can sha1 context be saved even though the blocks > arrive out of order? Each block is written to disk when it arrives. When all blocks have been written, the entire piece is read back and checked. Most of the time the last read is just reading from the cache, and in the cases when it's not, the download was probably too slow to keep it in the cache. > 2. Allocating disk space. When allocating disk space, libtorrent writes > a block of {piece_size} zero's to the disk. In Unix and Windows, this is > unnecessary. fseek and fwrite of the real data in the final resting > place is all that's needed; the OS allows seeking past the end of the > file (in write mode at least), and creates sparse files when there are > holes. I believe this is the way all OS's are intended to work, but I > don't actually know if it is by POSIX requirement, convention or > established practice. I should probably have written more precisely which problems and which OS'es I found problems with. But to be honest I'm not sure now, but I can imagine that windows 95/98/Me can't handle sparse files. > In the code, there's a suggestions that not all OS's behave this way. > Are there definite examples? > > In any case, wouldn't it be better to code a conditional #define where > the space is allocated by writing zeros only on OS's where this is > necessary (which I believe to be the minority), rather than incur the > penalty on all OS's ? Are there any applications using libtorrent on > those OS's? I think so. Did you try to remove the code that writes zeroes and see if it still works? > > The following snipped (seektest.c) can be illustrate: > > #include <stdio.h> > > main() > { > fseek(stdout, 1024*1024, SEEK_SET); > putchar('$'); > } > > compile and run as "seektest > testfile; ls -ls testfile". This will > show the file size, as well as the amount of disk space currently > allocated for the file. > > > 3. On a local test copy, I removed the random_shuffle(pieces) calls, in > effect making the downloads sequential. However, I am seeing behaviour I > cannot explain: > > After downloading 1/2 of the file in sequence, the client_test app > starts a download pattern like p-1,p+1,p-2,p+2,p-3,p+3, etc (where p is > the middle point of the file). However, reads from the file are never > done from the second half of the file. A picture is worth a thousand words: > > http://www.ohmi.org/~radu/torrentwork/sparse.png > > In the first half of the file, the writes are 16K each, and reads are > 1M, while in the second half of the file, writes are 1M each, and there > are no reads. My theory is that during the 1st half of the file, blocks > are written, and pieces are read back to calculate hashes, while in the > second half, entire pieces are written, after calculating the hashes, > such that reads are not necessary. > > Interestingly, in in the 1st half of the file, all the blocks are > written to disk in order, with no other data interleaved, so the library > could as well buffer the entire piece in memory before writing to the cache. > > Hoever, what I cannot explain is why the download behaviour changes at > half-full. Why could this be, and where is the code responsible? Did you have just one peer in this test as well? The thing is that there's no code that tries to keep the pieces ordered in the piece picker (for obvious reasons), so I'm not surprised that just removing the random_shuffle() didn't make it completely ordered. The piece picker works like this: It has a vector of vectors. Each vector contains all the piece-indices that has the corresponding number of peers. When a peer is joining, some pieces will be promoted to have one more peers, and moved from one vector to another. To save time (this is an operation that is quite common, since there usually are quite alot of pieces), the piece index that is removed is replaced by the index at the end, and when it is inserted into the other vector, it is appended at the end. This makes the operation O(1). That is probably the reason why the pieces aren't ordered. > > Perhaps the number of cached blocks is variable, and is increased such > that by the second half, it becomes big enough to store an entire piece? Right now there is no cache. An idea I had, was that pieces with more than a certain number of peers could be ordered. That would have the effect that "good" torrents would be downloaded sequencially. I'm planning on implementing this when I come back from the holiday (and also reply to your other mails). What do you think of that? -- Arvid Norberg ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Some design questions: 00021, Radu Hociung |
|---|---|
| Next by Date: | Re: Some design questions: 00021, Radu Hociung |
| Previous by Thread: | Some design questionsi: 00021, Radu Hociung |
| Next by Thread: | Re: Some design questions: 00021, Radu Hociung |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |