osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] New Python Initialization API


Here is my first review of https://www.python.org/dev/peps/pep-0587/ and
in general I think it's very good. There are some things I'd like to
consider changing before we embed them permanently in a public API, and
as I'm still keen to have the ability to write Python code for
configuration (to replace getpath.c, for example) I have a bias towards
making that work more smoothly with these changes when we get to it.

I think my biggest point (about halfway down) is that I'd rather use
argv/environ/etc. to *initialize* PyConfig and then only use the config
for initializing the runtime. That way it's more transparent for users
and more difficult for us to add options that embedders can't access.

The appendix is excellent, by the way. Very useful detail to have
written down.

Cheers,
Steve


> ``PyWideCharList`` is a list of ``wchar_t*`` strings.

I always forget whether "const" is valid in C99, but if it is, can we
make this a list of const strings?

I also prefer a name like ``PyWideStringList``, since that's what it is
(the other places we use WideChar in the C API refer only to a single
string, as far as I'm aware).

> ``PyInitError`` is a structure to store an error message or an exit code
> for the Python Initialization.

I love this struct! Currently it's private, but I wonder whether it's
worth making it public as PyError (or PyErrorInfo)? We obviously can't
replace all uses of int as a return value throughout the API, but I
think it has uses elsewhere and we may as well protect against having to
rename in the future.

> * ``exitcode`` (``int``): if greater or equal to zero, argument passed to
>  ``exit()``

Windows is likely to need/want negative exit codes, as many system
errors are represented as 0x80000000|(source of error)|(specific code).

> * ``user_err`` (int): if non-zero, the error is caused by the user
>   configuration, otherwise it's an internal Python error.

Maybe we could just encode this as "positive exitcode is user error,
negative is internal error"? I'm pretty sure struct return values are
passed by reference in most C calling conventions, so the size of the
struct isn't a big deal, but without a proper bool type it may look like
this is a second error code (like errno/winerror in a few places).

> ``PyPreConfig`` structure is used to pre-initialize Python:
>
> * Set the memory allocator
> * Configure the LC_CTYPE locale
> * Set the UTF-8 mode

I think we should have the isolated flag in here - oh wait, we do - I
think we should have the isolated/use_environment options listed in this
list :)

> Functions to pre-initialize Python:
>
> * ``PyInitError Py_PreInitialize(const PyPreConfig *config)``
> * ``PyInitError Py_PreInitializeFromArgs( const PyPreConfig *config,
int argc, char **argv)``
> * ``PyInitError Py_PreInitializeFromWideArgs( const PyPreConfig
*config, int argc, wchar_t **argv)``

I hope to one day be able to support multiple runtimes per process - can
we have an opaque PyRuntime object exposed publicly now and passed into
these functions?

(FWIW, I think we're a long way from being able to support multiple
runtimes *simultaneously*, so the initial implementation here would be
to have a PyRuntime_Create() that returns our global one once and then
errors until it's finalised. The migration path is probably to enable
switching of the current runtime via a dedicated function (i.e. one
active at a time, probably with thread local storage), since we have no
"context" parameter for C API functions, and obviously there are still
complexities such as poorly written extension modules that nonetheless
can be dealt with in embedding scenarios by simply not using them. This
doesn't seem like an unrealistic future, *unless* we add a whole lot of
new APIs now that can't allow it :) )

> ``PyPreConfig`` fields:
>
> * ``coerce_c_locale_warn``: if non-zero, emit a warning if the C locale
>   is coerced.
> * ``coerce_c_locale``: if equals to 2, coerce the C locale; if equals to
>   1, read the LC_CTYPE to decide if it should be coerced.

Can we use another value for coerce_c_locale to determine whether to
warn or not? Save a field.

> * ``legacy_windows_fs_encoding`` (Windows only): if non-zero, set the
>   Python filesystem encoding to ``"mbcs"``.
> * ``utf8_mode``: if non-zero, enable the UTF-8 mode

Why not just set the encodings here? The "PreInitializeFromArgs"
functions can override it based on the other variables we have, and
embedders have a more obvious question to answer than "do I want legacy
behaviour in my app".

Obviously we are not ready to import most encodings after pre
initialization, but I think that's okay. Embedders who set something
outside the range of what can be used without importing encodings will
get an error to that effect if we try.

In fact, I'd be totally okay with letting embedders specify their own
function pointer here to do encoding/decoding between Unicode and the OS
preferred encoding. That would let them use any approach they like.
Similarly for otherwise-unprintable messages (On an earlier project when
I was embedding Python into Windows Store apps - back when their API was
more heavily restricted - this would have been very helpful.)

> Example of Python initialization enabling the isolated mode::
>
>     PyConfig config = PyConfig_INIT;
>     config.isolated = 1;

Haven't we already used extenal values by this point that should have
been isolated? I'd rather have the isolation up front. Or better yet,
make isolation the default unless you call one of the "FromArgs"
functions, and then we don't actually need the config setting at all.

> PyConfig fields:

Before I start discussing individual groups of fields, I would really
like to see the following significant change here (because it'll help
keep us honest and prevent us breaking embedders in the future).

Currently you have three functions, that take a PyConfig and optionally
also use the environment/argv to figure out the settings:

> * ``PyInitError Py_InitializeFromConfig(const PyConfig *config)``
> * ``PyInitError Py_InitializeFromArgs(const PyConfig *config, int
argc, char **argv)``
> * ``PyInitError Py_InitializeFromWideArgs(const PyConfig *config, int
argc, wchar_t **argv)``

I would much prefer to see this flipped around, so that there is one
initialize function taking PyConfig, and two functions that will fill
out the PyConfig based on the environment:

(note two of the "const"s are gone)

* ``PyInitError Py_SetConfigFromArgs(PyConfig *config, int argc, char
**argv)``
* ``PyInitError Py_SetConfigFromWideArgs(PyConfig *config, int argc,
wchar_t **argv)``
* ``PyInitError Py_InitializeFromConfig(const PyConfig *config)``

This means that callers who want to behave like Python will request an
equivalent config and then use it. Those callers who want to be *nearly*
like Python can change things in between. And the precedence rules get
simpler because Py_SetConfig* just overwrites anything:

PyConfig config = PyConfig_INIT;
Py_SetConfigFromWideArgs(&config, argc, argv);
/* optionally change any settings */
Py_InitializeFromConfig(&config);

We could even split out PyMainConfig here and have another function to
collect the settings to pass to Py_RunMain() (such as the script or
module name, things to print on exit, etc.).

Our python.c then uses this configuration, so it gets a few lines longer
than at present. But it becomes a more useful example for people who
want a nearly-like-Python version, and also ensures that any new
configuration we support will be available to embedders.

So with that said, here's what I think about the fields:

> * ``argv``: ``sys.argv``
> * ``base_exec_prefix``: ``sys.base_exec_prefix``
> * ``base_prefix``: ``sys.base_prefix``
> * ``exec_prefix``: ``sys.exec_prefix``
> * ``executable``: ``sys.executable``
> * ``prefix``: ``sys.prefix``
> * ``xoptions``: ``sys._xoptions``

I like all of these, as they nicely map to their sys members.

> * ``module_search_path_env``: ``PYTHONPATH`` environment variale value
> * ``module_search_paths``, ``use_module_search_paths``: ``sys.path``

Why not just have "path" to mirror sys.path? If we go to a Py_SetConfig
approach then all the logic for inferring the path is in there, and if
we don't then the FromArgs() function will either totally override or
ignore it. Either way, no embedder needs to set these individually.

> * ``home``: Python home
> * ``program_name``: Program name
> * ``program``: ``argv[0]`` or an empty string
> * ``user_site_directory``: if non-zero, add user site directory to
>   ``sys.path``

Similarly, these play a role when inferring the regular Python sys.path,
but are not something you should need to use when embedding.

> * ``dll_path`` (Windows only): Windows DLL path

I'd have to look up exactly how this is used, but I don't think it's
configurable (certainly not by this point where it's already been loaded).

> * ``filesystem_encoding``: Filesystem encoding,
>   ``sys.getfilesystemencoding()``
> * ``filesystem_errors``: Filesystem encoding errors,
>   ``sys.getfilesystemencodeerrors()``

See above. If we need these earlier in initialization (and I agree we
probably do), then let's just put them in the pre-initialize config.

> * ``dump_refs``: if non-zero, display all objects still alive at exit
> * ``inspect``: enter interactive mode after executing a script or a
>   command
> * ``interactive``: interactive mode
> * ``malloc_stats``: if non-zero, dump memory allocation statistics
>   at exit
> * ``quiet``: quiet mode (ex: don't display the copyright and version
>   messages even in interactive mode)
> * ``show_alloc_count``: show allocation counts at exit
> * ``show_ref_count``: show total reference count at exit
> * ``site_import``: import the ``site`` module at startup?
> * ``skip_source_first_line``: skip the first line of the source

These all seem to be flags specific to regular Python and not embedders
(providing we have ways for embedders to achieve the same thing through
their own API calls, but even if we don't, I'd still rather not codify
them as public runtime API). Having them be an optional flag for
Py_RunMain() or a PyMainConfig struct would keep them closer to where
they belong.

> * ``faulthandler``: if non-zero, call ``faulthandler.enable()``
> * ``install_signal_handlers``: install signal handlers?
> * ``tracemalloc``: if non-zero, call ``tracemalloc.start(value)``

Probably it's not a good idea to install signal handlers too early, but
I think the other two should be able to start any time between pre-init
and actual init, no? So make PyFaultHandler_Init() and
PyTraceMalloc_Init() public and say "if you want them, call them".

> * ``run_command``: ``-c COMMAND`` argument
> * ``run_filename``: ``python3 SCRIPT`` argument
> * ``run_module``: ``python3 -m MODULE`` argument

I think these should be in a separate configuration struct for
Py_RunMain(), probably along with the section above too.

(The other fields all seem fine to me.)

> This PEP adds a new ``Py_UnixMain()`` function which takes command line
> arguments as bytes::
>
>     int Py_UnixMain(int argc, char **argv)

I was part of the discussion about this, so I have some more context
than what's in the PEP.

Given your next example shows this function would be about six lines
long, why do we want to add it? Better to deprecate Py_Main() completely
and just copy/paste those six lines from elsewhere (which also helps in
the case where you need to do one little thing more and end up having to
find those six lines anyway, as in the same example).

> Open Questions
> ==============
>
> * Do we need to add an API for import ``inittab``?

I don't think so. Once embedders have a three-step initialization
(pre-init, init, runmain) there are two valid places they can call their
own init functions.

> * What about the stable ABI? Should we add a version into
>   ``PyPreConfig`` and ``PyConfig`` structures somehow? The Windows API
>   is known for its ABI stability and it stores the structure size into
>   the structure directly. Do the same?

Yeah, I think so. We already have the Py*_INIT macros, so we could just
use a version number that we increment manually, and that will let us
optionally ignore added members (or deprecated members that have not
been physically removed).

(Though I'd rather make it easier for applications to include a local
copy of Python rather than have to safely deal with whatever is
installed on the system. But that's a Windows-ism, so making both
approaches work is important :) )

> * The PEP 432 stores ``PYTHONCASEOK`` into the config. Do we need
>   to add something for that into ``PyConfig``? How would it be exposed
>   at the Python level for ``importlib``? Passed as an argument to
>   ``importlib._bootstrap._setup()`` maybe? It can be added later if
>   needed.

Could we convert it into an xoption? It's very rarely used, to my knowledge.

> * ``python._pth`` (Windows only)

I'd like to make this for all platforms, though my understanding was
that since Python is not really relocatable on other OS's it isn't so
important. And if the rest of this change happens it'll be easier to
implement anyway :)