[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Dropping support for CentOS 5 / RHEL5 in Python packages


Hello Wes,

I'm ok with option 2 when we use the yet unfinished manylinux2010 image as the base. This way, we will still be able to produce wheels that in the near future are actually based an a architecture tag supported by a PEP. Also as I have some packaging nightmare, I would feel much better when we first are able to get a release out that features parquet-cpp merged into the main Arrow tree before we switch the manylinux* base image.

Uwe

On Wed, Sep 5, 2018, at 1:22 AM, Ted Dunning wrote:
> Just as a point of reference, I don't think that get any pushback at MapR
> for not supporting RHEL 5 and that has been our policy for a few years now.
> 
> That experience should be pretty similar for Arrow, except that I would
> expect that new adoptions might be even more canted towards current
> versions.
> 
> 
> 
> 
> On Tue, Sep 4, 2018 at 3:24 PM Wes McKinney <wesmckinn@xxxxxxxxx> wrote:
> 
> > hi folks,
> >
> > Surfacing a JIRA discussion ([4]) to the mailing list for discussion.
> >
> > The manylinux1 ABI was developed to provide a mechanism for portable
> > Python packages with pre-compiled binary extensions supporting C and
> > C++, including C++11, on a wide variety of Linux distributions without
> > need for distribution-specific packages. This is accomplished using
> > RedHat's devtoolset-2, which performs selecting static linking of
> > symbols from libstdc++ that cause ABI conflicts when used on systems
> > with older standard libraries.
> >
> > The base image for producing these binaries is specified in a Dockerfile
> > [1].
> >
> > The problem that we are having is that some C++ libraries, notably
> > Google's Abseil C++ library, require a version of glibc that is too
> > new for RHEL5. By building with CentOS6 / RHEL6 as the base image, we
> > would get a new enough glibc (version 2.12). But building against
> > glibc 2.12 would leave behind the RHEL5 folks.
> >
> > There is the in-discussion manylinux2010 standard uses RHEL6 as a base
> > standard, but it is not yet finalized or in production.
> >
> > Some modern C++ projects shipping to Python have already left behind
> > the manylinux1 standard even though their Python binaries claim to
> > implement the standard. Both PyTorch and TensorFlow are tagged as
> > manylinux1 although they have a different ABI. See [2] for example and
> > [3]
> >
> > In my view there are two paths forward, neither perfect:
> >
> > 1) Stick with the manylinux1 ABI and do not use thirdparty libraries
> > requiring newer glibc
> > 2) "Cheat" on manylinux1 by using centos6 instead of centos5 as the
> > base image for the wheel builds. This is what PyTorch is doing
> >
> > Since centos5 / RHEL5 are already past EOL those would be the primary
> > casualties, but I'm not sure how many users would be affected. My
> > guess is that they represent a small minority of our users at this
> > point. RedHat is offering extended support for RHEL5 through end of
> > 2020 but those are probably fairly exceptional cases and unlikely
> > (IMHO) to be working on the bleeding edge of Python data engineering.
> >
> > Personally I would like to go with Option 2 and hope that this
> > particular Python packaging gets sorted out in the next 12-24 months
> > as we've already suffered problems due to TensorFlow and PyTorch's
> > non-conformity with the manylinux1 ABI.
> >
> > Interested in the opinions of others.
> >
> > - Wes
> >
> > [1]:
> > https://github.com/pypa/manylinux/blob/master/docker/Dockerfile-x86_64
> > [2]:
> > https://github.com/NVIDIA/nvidia-docker/issues/348#issuecomment-288875848
> > [3]: https://github.com/pypa/manylinux/issues/96
> > [4]: https://issues.apache.org/jira/browse/ARROW-2461
> >