[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: zstd compression for packages

On Tue, Mar 13, 2018 at 1:43 AM, Balint Reczey <balint.reczey@xxxxxxxxxxxxx> wrote:
Hi Daniel,

On Mon, Mar 12, 2018 at 2:11 PM, Daniel Axtens
<daniel.axtens@xxxxxxxxxxxxx> wrote:
> Hi,
> I looked into compression algorithms a bit in a previous role, and to be
> honest I'm quite surprised to see zstd proposed for package storage. zstd,
> according to its own github repo, is "targeting real-time compression
> scenarios". It's not really designed to be run at its maximum compression
> level, it's designed to really quickly compress data coming off the wire -
> things like compressing log files being streamed to a central server, or I
> guess writing random data to btrfs where speed is absolutely an issue.
> Is speed of decompression a big user concern relative to file size? I admit
> that I am biased - as an Australian and with the crummy internet that my
> location entails, I'd save much more time if the file was 6% smaller and
> took 10% longer to decompress than the other way around.

Yes, decompression speed is a big issue in some cases. Please consider
the case of provisioning cluoud/container instances, where after
booting the image plenty of packages need to be installed and saving
seconds matter a lot.

Zstd format also allows parallel decompression which can make package
installation even quicker in wall-clock time.

Internet connection speed increases by ~50% (according to this [3]
study which matches my experience)  on average per year which is more
than 6% for every two months.

The future is pretty unevenly distributed, and lots of the planet is stuck on really bad internet still.

AFAICT, [3] is anecdotal, rather than a 'study' - it's based on data from 1 person living in California. This is not really representative. If we look at the connection speed visualisation from the Akamai State of the Internet report [4], it shows that lots and lots of countries - most of the world! - has significantly slower internet than that person. 

(FWIW, anecdotally, I've never had a residential connection get faster (except when I moved), which is mostly because the speed of ADSL is pretty much fixed. Anecdotal reports from users in developing countries, and rural areas of developed countries are not encouraging either: [5].)

Having said that, I'm not unsympathetic to the usecase you outline. I just am saddened to see the trade-offs fall against the interests of people with worse access to the internet. If I can find you ways of saving at least as much time without making the files bigger, would you be open to that?


> Did you consider Google's Brotli?

We did consider it but it was less promising.


[3] http://xahlee.info/comp/bandwidth.html

> Regards,
> Daniel
> On Mon, Mar 12, 2018 at 9:58 PM, Julian Andres Klode
> <julian.klode@xxxxxxxxxxxxx> wrote:
>> On Mon, Mar 12, 2018 at 11:06:11AM +0100, Julian Andres Klode wrote:
>> > Hey folks,
>> >
>> > We had a coding day in Foundations last week and Balint and Julian added
>> > support for zstd compression to dpkg [1] and apt [2].
>> >
>> > [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892664
>> > [2] https://salsa.debian.org/apt-team/apt/merge_requests/8
>> >
>> > Zstd is a compression algorithm developed by Facebook that offers far
>> > higher decompression speeds than xz or even gzip (at roughly constant
>> > speed and memory usage across all levels), while offering 19 compression
>> > levels ranging from roughly comparable to gzip in size (but much faster)
>> > to 19, which is roughly comparable to xz -6:
>> >
>> > In our configuration, we run zstd at level 19. For bionic main amd64,
>> > this causes a size increase of about 6%, from roughly 5.6 to 5.9 GB.
>> > Installs speed up by about 10%, or, if eatmydata is involved, by up to
>> > 40% - user time generally by about 50%.
>> >
>> > Our implementations for apt and dpkg support multiple frames as used by
>> > pzstd, so packages can be compressed and decompressed in parallel
>> > eventually.
>> More links:
>> PPA:
>> https://launchpad.net/~canonical-foundations/+archive/ubuntu/zstd-archive
>> APT merge request: https://salsa.debian.org/apt-team/apt/merge_requests/8
>> dpkg patches:      https://bugs.debian.org/892664
>> I'd also like to talk a bit more about libzstd itself: The package is
>> currently in universe, but btrfs recently gained support for zstd,
>> so we already have a copy in the kernel and we need to MIR it anyway
>> for btrfs-progs.
>> --
>> debian developer - deb.li/jak | jak-linux.org - free software dev
>> ubuntu core developer                              i speak de, en
>> --

Balint Reczey
Ubuntu & Debian Developer