[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] Inclusion of lz4 bindings in stdlib?

On Thu, 29 Nov 2018 at 09:13, Gregory P. Smith <greg at krypto.org> wrote:
> Q: Are there other popular alternatives to fill that niche that we should strongly consider instead or as well?
> 5 years ago the answer would've been Snappy.  15 years ago the answer would've been LZO.

Today LZ4 hits a sweet spot for fast compression and decompression at
the lower compression ratio end of the spectrum, offering
significantly faster compression and decompression than zlib or bz2,
but not as high compression ratios (at usable speeds). It's also had
time to stabilize, and a standard frame format for compressed data has
been adopted by the community.

The other main contenders in town are zstd, which was mentioned
earlier in the thread, and brotli. Both are based on dictionary
compression. Zstd is very impressive, offering high compression
ratios, but is being very actively developed at present, so is a bit
more of a moving target.Brotli is in the same ballpark as Zstd. They
both cover the higher compression end of the spectrum than lz4. Some
nice visualizations are here (although the data is now a bit out of
date - lz4 has had some speed improvements at the higher compression
ratio end):


> I suggest not rabbit-holing this on whether we should adopt a top level namespace for these such as "compress".  A good question to ask, but we can resolve that larger topic on its own without blocking anything.

It's funny, but I had gone around in that loop in my head ahead of
sending my email. My thinking was: there's a real need for some
unification and simplification in the compression space, but I'll work
on integrating LZ4, and in the process look at opportunities for the
new interface design. I'm a fan of learning through iteration, rather
than spending 5 years designing the ultimate compression abstraction
and then finding a corner case that it doesn't fit.

> lz4 has claimed the global pypi lz4 module namespace today so moving it to the stdlib under that name is normal - A pretty transparent transition.  If we do that, the PyPI version of lz4 should remain for use on older CPython versions, but effectively be frozen, never to gain new features once lz4 has landed in its first actual CPython release.

Yes, that was what I was presuming would be the path forward.