[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Multithreaded compression/decompression library with python bindings?

On 05/10/17 20:38, Stephan Houben wrote:
> Op 2017-10-05, Thomas Nyberg schreef <tomuxiong at gmx.com>:
>> Btw if anyone knows a better way to handle this sort of thing, I'm all
>> ears. Given my current implementation I could use any compression that
>> works with stdin/stdout as long as I could sort out the waiting on the
>> subprocess. In fact, bzip2 is probably more than I need...I've half used
>> it out of habit rather than anything else.
> lzma ("xv" format) compression is generally both better and faster than
> bzip2. So that gives you already some advantage.
> Moreover, the Python lzma docs say:
> "When opening a file for reading, the input file may be the concatenation
> of multiple separate compressed streams. These are transparently decoded
> as a single logical stream."
> This seems to open the possibility to simply divide your input into,
> say, 100 MB blocks, compress each of them in a separate thread/process
> and then concatenate them. 

Perhaps, but

 - this will probably be less space-efficient (by a small enough
fraction, if your blocks are large enough)
 - this *might* be a Python implementation detail (I doubt that, but who
 - this obviously won't work for decompression unless you know a priori
that there are division points, and where they are.