osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TensorFlow, PyTorch, and manylinux1


Reposting since I wasn't subscribed to developers@xxxxxxxxxxxxxx. I
also didn't see Soumith's response since it didn't come through to
dev@xxxxxxxxxxxxxxxx

In response to the non-conforming ABI in the TF and PyTorch wheels, we
have attempted to hack around the issue with some elaborate
workarounds [1] [2] that have ultimately proved to not work
universally. The bottom line is that this is burdening other projects
in the Python ecosystem and causing confusing application crashes.

First, to state what should hopefully obvious to many of you, Python
wheels are not a robust way to deploy complex C++ projects, even
setting aside the compiler toolchain issue. If a project has
non-trivial third party dependencies, you either have to statically
link them or bundle shared libraries with the wheel (we do a bit of
both in Apache Arrow). Neither solution is foolproof in all cases.
There are other downsides to wheels when it comes to numerical
computing -- it is difficult to utilize things like the Intel MKL
which may be used by multiple projects. If two projects have the same
third party C++ dependency (e.g. let's use gRPC or libprotobuf as a
straw man example), it's hard to guarantee that versions or ABI will
not conflict with each other.

In packaging with conda, we pin all dependencies when building
projects that depend on them, then package and deploy the dependencies
as separate shared libraries instead of bundling. To resolve the need
for newer compilers or newer C++ standard library, libstdc++.so and
other system shared libraries are packaged and installed as
dependencies. In manylinux1, the RedHat devtoolset compiler toolchain
is used as it performs selective static linking of symbols to enable
C++11 libraries to be deployed on older Linuxes like RHEL5/6. A conda
environment functions as sort of portable miniature Linux
distribution.

Given the current state of things, as using the TensorFlow and PyTorch
wheels in the same process as other conforming manylinux1 wheels is
unsafe, it's hard to see how one can continue to recommend pip as a
preferred installation path until the ABI problems are resolved. For
example, "pip" is what is recommended for installing TensorFlow on
Linux [3]. It's unclear that non-compliant wheels should be allowed in
the package manager at all (I'm aware that this was deemed to not be
the responsibility of PyPI to verify policy compliance [4]).

A couple possible paths forward (there may be others):

* Collaborate with the Python packaging authority to evolve the
manylinux ABI to be able to produce compliant wheels that support the
build and deployment requirements of these projects
* Create a new ABI tag for CUDA/C++11-enabled Python wheels so that
projects can ship packages that can be guaranteed to work properly
with TF/PyTorch. This might require vendoring libstdc++ in some kind
of "toolchain" wheel that projects using this new ABI can depend on

Note that these toolchain and deployment issues are absent when
building and deploying with conda packages, since build- and run-time
dependencies can be pinned and shared across all the projects that
depend on them, ensuring ABI cross-compatibility. It's great to have
the convenience of "pip install $PROJECT", but I believe that these
projects have outgrown the intended use for pip and wheel
distributions.

Until the ABI incompatibilities are resolved, I would encourage more
prominent user documentation about the non-portability and potential
for crashes with these Linux wheels.

Thanks,
Wes

[1]: https://github.com/apache/arrow/commit/537e7f7fd503dd920c0b9f0cef8a2de86bc69e3b
[2]: https://github.com/apache/arrow/commit/e7aaf7bf3d3e326b5fe58d20f8fc45b5cec01cac
[3]: https://www.tensorflow.org/install/
[4]: https://www.python.org/dev/peps/pep-0513/#id50
On Sat, Dec 15, 2018 at 11:25 PM Robert Nishihara
<robertnishihara@xxxxxxxxx> wrote:
>
> On Sat, Dec 15, 2018 at 8:43 PM Philipp Moritz <pcmoritz@xxxxxxxxx> wrote:
>
> > Dear all,
> >
> > As some of you know, there is a standard in Python called manylinux (
> > https://www.python.org/dev/peps/pep-0513/) to package binary executables
> > and libraries into a “wheel” in a way that allows the code to be run on a
> > wide variety of Linux distributions. This is very convenient for Python
> > users, since such libraries can be easily installed via pip.
> >
> > This standard is also important for a second reason: If many different
> > wheels are used together in a single Python process, adhering to manylinux
> > ensures that these libraries work together well and don’t trip on each
> > other’s toes (this could easily happen if different versions of libstdc++
> > are used for example). Therefore *even if support for only a single
> > distribution like Ubuntu is desired*, it is important to be manylinux
> > compatible to make sure everybody’s wheels work together well.
> >
> > TensorFlow and PyTorch unfortunately don’t produce manylinux compatible
> > wheels. The challenge is due, at least in part, to the need to use
> > nvidia-docker to build GPU binaries [10]. This causes various levels of
> > pain for the rest of the Python community, see for example [1] [2] [3] [4]
> > [5] [6] [7] [8].
> >
> > The purpose of the e-mail is to get a discussion started on how we can
> > make TensorFlow and PyTorch manylinux compliant. There is a new standard in
> > the works [9] so hopefully we can discuss what would be necessary to make
> > sure TensorFlow and PyTorch can adhere to this standard in the future.
> >
> > It would make everybody’s lives just a little bit better! Any ideas are
> > appreciated.
> >
> > @soumith: Could you cc the relevant list? I couldn't find a pytorch dev
> > mailing list.
> >
> > Best,
> > Philipp.
> >
> > [1] https://github.com/tensorflow/tensorflow/issues/5033
> > [2] https://github.com/tensorflow/tensorflow/issues/8802
> > [3] https://github.com/primitiv/primitiv-python/issues/28
> > [4] https://github.com/zarr-developers/numcodecs/issues/70
> > [5] https://github.com/apache/arrow/pull/3177
> > [6] https://github.com/tensorflow/tensorflow/issues/13615
> > [7] https://github.com/pytorch/pytorch/issues/8358
> > [8] https://github.com/ray-project/ray/issues/2159
> > [9] https://www.python.org/dev/peps/pep-0571/
> > [10]
> > https://github.com/tensorflow/tensorflow/issues/8802#issuecomment-291935940
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "ray-dev" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to ray-dev+unsubscribe@xxxxxxxxxxxxxxxx.
> > To post to this group, send email to ray-dev@xxxxxxxxxxxxxxxx.
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/ray-dev/CAFs1FxUBAag6AThj34twiAB6KY3t5sJSJF3g70K3SvF-%2BzGGgw%40mail.gmail.com
> > <https://groups.google.com/d/msgid/ray-dev/CAFs1FxUBAag6AThj34twiAB6KY3t5sJSJF3g70K3SvF-%2BzGGgw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> > .
> > For more options, visit https://groups.google.com/d/optout.
> >