[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] Compile-time resolution of packages [Was: Another update for PEP 394...]

On 2019-02-26, Gregory P. Smith wrote:
> On Tue, Feb 26, 2019 at 9:55 AM Barry Warsaw <barry at python.org> wrote:
> For an OS distro provided interpreter, being able to restrict its use to
> only OS distro provided software would be ideal (so ideal that people who
> haven't learned the hard distro maintenance lessons may hate me for it).

Interesting idea.  I remember when I was helping develop Debian
packaging guides for Python software.   I had to fight with people
to convince them that Debian packages should use 


rather than

    #!/usr/bin/env python

The situtation is much better now but I still sometimes have
packaged software fail because it picks up my version of
/usr/local/bin/python.  I don't understand how people can believe
grabbing /usr/local/bin/python is going to be a way to build a
reliable system.

> Such a restriction could be implemented within the interpreter itself. For
> example: Say that only this set of fully qualified path whitelisted .py
> files are allowed to invoke it, with no interactive, stdin, or command line
> "-c" use allowed.

I think this is related to an idea I was tinkering with on the
weekend.  Why shouldn't we do more compile time linkage of Python
packages?  At least, I think we give people the option to do it.
Obviously you still need to also support run-time import search
(interactive REPL, support __import__(unknown_at_compiletime)__).

Here is the sketch of the idea (probably half-baked, as most of my
ideas are):

- add PYTHONPACKAGES envvar and -p options to 'python'

- the argument for these options would be a colon separated list of
  Python package archives (crates, bales, bundles?).  The -p option
  could be a colon separated list or provided multiple times to
  specify more packages.

- the modules/packages contained in those archives become the
  preferred bytecode code source when those names are imported.  We
  look there first.  The crawling around behavor (dynamic import
  based on sys.path) happens only if a module is not found and could
  be turned off.

- the linking of the modules could be computed when the code is
  compiled and the package archive created, rather than when the
  'import' statement gets executed.  This would provide a number of
  advantages.  It would be faster.  Code analysis tools could
  statically determine which modules imported code corresponds too.
  E.g. if your code calls module.foo, assuming no monkey patching,
  you know what code 'foo' actually is.

- to get extra fancy, the package archives could be dynamic
  link libraries containing "frozen modules" like this FB experiment:
  That way, you avoid the unmarshal step and just execute the module
  bytecode directly.  On startup, Python would dlopen all of the
  package archives specified by PYTHONPACKAGES.  On init, it would
  build an index of the package tree and it would have the memory
  location for the code object for each module.

That would seem like quite a useful thing.  For an application like
Mercurial, they could build all the modules/packages required into a
single package archive.  Or, there would be a small number of
archives (one for standard Python library, one for everything else
that Mercurial needs).

Now that I write this, it sounds a lot like the debate between
static linking and dynamic linking.  Golang does static linking and
people seem to like the single executable distribution.