|
Re: Disk cache: msg#00025network.bit-torrent.libtorrent
Arvid Norberg wrote: > > On Apr 14, 2006, at 17:05, Radu Hociung wrote: > >> Arvid Norberg wrote: >> >>> >>> On Apr 14, 2006, at 06:27, Radu Hociung wrote: >>> >>>> I doubt that writing whole pieces would make much difference. >>>> Looking at >>>> libtorrent's current download pattern, and at the strace, it looks >>>> like >>>> it writes the blocks of a piece in the proper order. So they probably >>>> get assembled into the respective pieces in the OS's cache. If they >>>> were >>>> being downloaded out of order, I would see a case for caching whole >>>> pieces. >>> >>> >>> I thought you downloaded a "wild" torrent, with more than one peer. If >>> you had done that, you would have noticed that the write operations >>> would have been scattered more. Because libtorrent is writing each >>> block >>> to disk, and even though blocks within a piece are downloaded >>> sequentially, it would still be more seeking. >>> >>> In high speeds libtorrent prefers to download whole pieces from single >>> peers, which means block writes to different pieces would be >>> interleaved. >> >> >> Ah, of course. I should have thought of this. >> >> In order to answer questions regarding the behaviour of OS caches, there >> is a library available to study physical disk access, written at the >> Renaissance Computing Institute. It requires a patch to the kernel, as >> well as some calls in the user-level application to start/stop tracing. >> >> If the user-level calls were put in strace, then physical analysis of an >> application would be easier to apply to any client, without modifying >> the client itself. >> >> Here's the URL to the (open source) library: >> >> http://www.renci.org/research/cadre/CADRETraceLibraries/ >> CADREPhysicalIOLibrary.htm >> >> The testbench sw I am using is essentially strace, with modifications I >> made. I plan to release this testbench so that you can run it yourselves >> also, which I think will allow you to easily experiment and draw >> conclusions, without requiring guessing. > > > Do you think it would be possible to use it on a clean strace log > (without the modifications?). I'm asking because in that case maybe it > would be possible to use it on a ktrace log (which is used on darwin). The CADRE library is not used on a log. Instead, it must be activated/deactivated somehow. The information it captures when activated is stored in a separate location, not in the strace log. Further, in order to use it, a patch must be applied to a linux kernel. This patch adds instrumentation to the device drivers. It's that instrumentation that is then started/stopped by making a call from the user level application. So, I don't think this library will run on darwin; I am not familiar with ktrace, but it looks to be an strace-like utility. Does the "-ti" option not log interractions of the kernel with the hardware? If it does, then you don't need CADRE in order to monitor the behaviour of the disk cache. >>>> However, you're right, about not knowing when the writes to disk >>>> actually occur: >>>> >>>> Say that the kernel starts off with an empty cache and 100MB free RAM. >>>> When downloading a 5GB file, after the 1st 100MB are received, the OS >>>> will have to start writing. It's reasonable that it will start >>>> flushing >>>> the oldest cache slots first, so it is likely that after the fist >>>> 100MB >>>> received, it will start writing the data it received first. >>>> >>>> In the case of some advanced filesystems (such as XFS, which buffers >>>> data as long as possible and allocates storage at the last possible >>>> moment, when flushing buffers becomes innevitable), it is possible >>>> that >>>> the cache won't always flush the first-received data, but it may >>>> try to >>>> group buffers based on their relative closeness. >>> >>> >>> Maybe it's possible to get timestamps in the trace log. Then it >>> would be >>> possible to estimate when actual writes were done. >> >> >> It would be very easy to add timestamps, you can use the same command >> line as I show in wikipedia, but add "-tt" (timestamp) and/or -T (call >> duration) to the strace options. >> >> You can this way profile any torrent. >> >> >>>> But in any case, when the file size is much larger than the available >>>> cache memory, it is inevitable that the disk heads will jump around. >>>> >>>> In my test setup, I only had 1 seed and 1 client, so to the client, >>>> all >>>> the pieces it didn't already have were equally rare in the 'swarm', >>>> and >>>> the random download pattern really did not have any purpose. >>>> >>>> >>>> I was thinking of another way of scheduling downloads. With 100MB of >>>> cache, the client could "focus" on downloading 100MB chunks of the >>>> torrent, and select the rarest first within that 100MB chunk before >>>> moving on to another chunk. the chunk could be a sliding window or >>>> such. >>>> Before moving the sliding window, the client could do a 'sync' to >>>> force >>>> data to be flushed. I would assume that on sync, the data would be >>>> written in the most contiguous way possible. >>>> >>>> I plan on experimenting with the scheduling routines of either >>>> libtorrent or transmission to test out the various hypotheses. >>>> >>>> The more I play around, the more I convince myself that it's the >>>> scheduling algorithm that can bring the most benefits, not the caching >>> >>> >>> I'm a bit reluctant to change the random distribution + rarest first >>> algorithm, because it is much more efficient in keeping a torrent alive >>> than if chunks would be downloaded. (with alive I mean that the >>> distributed copies >= 1). >> >> >> I understand, but in some cases the random distribution does not do >> anything. In the case I show of 1:1, it's no safer to download data out >> of order than it is to download sequentially I think. > > > Not in the case where you're guaranteed that the seed will stay online, > but it's very uncommon to have such a guarantee. Usually, the seed > could go offline and a half-finished peer will come online. The optimal > case would be that both peers would have 1 distributed copy and be able > to complete the file. And closest one can get to that (without having > any global knowledge) is to download pieces in random. Ah, I see... I had not thought of such a scenario. Very interesting. So, in a torrent like you described (1 seed, 1 hidden/offline peer, 1 blind peer), assuming downloaders get 1/2 of the file each, they would be able to finish only if they have complementary halves of the file. With a randomization at the piece level, with say, 1,000 pieces in total, each of the downloaders need the right 500 pieces to make the torrent healthy. The probability for the blind client downloading the right 500 pieces, not knowing which 500 the hidden peer has, are about 1/(2^1000), since each piece has 1/2 chance of being the right piece, and that there are 1000 guesses to be made. I still think sequential downloading gives the torrent a better chance for healing: Say that each client downloads quarters of the file in random order. The pieces in each quarter are downloaded sequentially. (ie, all the pieces in quarter 1 are downloaded in sequential order, but the file may be downloaded as Q2,Q4,Q3,Q1) In this case, if the torrent behaves like you described above, the chances for the torrent remaining healthy with the blind and hidden peers present, but seed absent, are about 1/(2^4), since there are 4 guesses to be made. If each of the clients has 1/2 of the file when the seed goes away, they have a 1/16 chance of finishing. So comparing a download strategy of piece-by-piece-randomly (PBPR) or quarter-by-quarter-randomly (QBQR), the QBQR strategy has a 6.25% chance of finishing, while PBPR has no (1/(2^1000)) chance of finishing. I think even the 6.25% probability can be improved: If the file is downloaded in halves instead, there is a 25% chance of finishing this download. Perhaps each client can decide whether to download halves, quarters, eights or whatever, depending on how many peers it encounters: 1 seed, 0 peers = download halves 1 seed, 1 peer = download quarters 2 seeds, 2 peers = download eights What do you think? >> Also, in the case of a perfect torrent (more seeds than downloaders, say >> 20:5), say that two pieces have availability of 20 and 24 respectively. >> Does it really make a difference if you download the more available >> piece first? Is it likely that the 20 seeds will leave the torrent, >> rendering it dead? The client should constantly evaluate risk, and >> switch to "survival" mode only when necessary, but it's overkill to >> always run in "survival" mode. > > > This is very true (as mentioned in another branch of this thread). I > don't think the ratio is that important though. As long as a piece has > enough peers, it could be downloaded sequentially. > >>>> algorithm. After all, in a case where most pieces are written to the >>>> disk once, and never read back, the cache acts as little more than a >>>> simple delayed-write buffer, whether it is an application cache or >>>> an OS >>>> cache. >>> >>> >>> That is why I haven't made any real effort in implementing a cache in >>> libtorrent so far. >>> >>>> BTW: Why are there 2 duplicate seeks before every write and 3 before >>>> reads? Here's an example: >>>> >>>> 21561 write(6, /* head at 4161536 effectiveness 61.077 */0x8083e31, >>>> 16384) = 16384 >>>> 21561 _llseek(6, 4177920, [4177920], /* BT efficiency 61.077% */ >>>> SEEK_SET) = 0 >>>> 21561 _llseek(6, 0, [4177920], /* BT efficiency 61.077% */ >>>> SEEK_CUR) = 0 >>>> 21561 write(6, /* head at 4177920 effectiveness 61.111 */0x8083e31, >>>> 16384) = 16384 >>>> 21561 _llseek(6, 3145728, [3145728], /* BT efficiency 61.111% */ >>>> SEEK_SET) = 0 >>>> 21561 _llseek(6, 0, [3145728], /* BT efficiency 61.111% */ >>>> SEEK_CUR) = 0 >>>> 21561 _llseek(6, 0, [3145728], /* BT efficiency 61.111% */ >>>> SEEK_CUR) = 0 >>>> 21561 read(6, /* head at 4194304 effectiveness 60.000 */0x404f4008, >>>> 1048576) = 1048576 >>>> 21561 _llseek(6, 4194304, [4194304], /* BT efficiency 60.000% */ >>>> SEEK_SET) = 0 >>>> 21561 _llseek(6, 0, [4194304], /* BT efficiency 60.000% */ >>>> SEEK_CUR) = 0 >>>> 21561 write(6, /* head at 4194304 effectiveness 61.905 */0x404f4008, >>>> 1048576) = 1048576 >>> >>> >>> Does this happen only with libtorrent, or with transmission too? If >>> it's >>> in both, I would guess that it's an implementation detail of the POSIX >>> layer, that the POSIX calls don't translate directly to system calls. >>> Since libtorrent has a pool of open files, each read or write operation >>> has to seek before reading/writing, because there may have been other >>> such operations since last time, which moved the file position. This >>> means that even when reading sequentially, libtorrent will seek to the >>> current position because it doesn't know that the last read/write >>> operation left the file position at the place where it wants it for the >>> next operation. >>> >>> I can't imagine that there's any cost involved in doing that extra >>> seek, >>> the alternative would be to call tell() and the compare, and then seek. >>> But the common case is that that seek is necessary anyway, it's just >>> that in your case you only had one peer. >> >> >> >> This only happens in libtorrent; transmission seeks exactly once for >> both reads and writes. > > > That is very strange, I'll have to investigate this. > >> So if you do the SEEK_CUR in order to ensure the cursor is where you >> expect it, it means you don't have an atomic lock between the SEEK_SET >> and the read/write? If you don't have such a lock and your app is >> multi-threaded, what prevents another thread from seeking between your >> SEEK_CUR and the read/write? > > > No, that's not why I do a seek before every read and write. All IO > operations are done in the same thread. But every read operation is > done without any context. The current file position is ignored. The > storage abstraction basically gives random access to the files. And the > queue for sending blocks may be interleaved with blocks from different > pieces. Sounds like the SEEK_CUR is not needed then. Cheers, Radu. ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Disk cache: 00025, Arvid Norberg |
|---|---|
| Next by Date: | Re: Disk cache: 00025, Radu Hociung |
| Previous by Thread: | Re: Disk cachei: 00025, Arvid Norberg |
| Next by Thread: | Re: Disk cache: 00025, Arvid Norberg |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |