osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] PEP 580/590 discussion


So, I spent another day pondering the PEPs.

I love PEP 590's simplicity and PEP 580's extensibility. As I hinted 
before, I hope they can they be combined, and I believe we can achieve 
that by having PEP 590's (o+offset) point not just to function pointer, 
but to a {function pointer; flags} struct with flags defined for two 
optimizations:
- "Method-like", i.e. compatible with LOAD_METHOD/CALL_METHOD.
- "Argument offsetting request", allowing PEP 590's 
PY_VECTORCALL_ARGUMENTS_OFFSET optimization.

This would mean one basic call signature (today's METH_FASTCALL | 
METH_KEYWORD), with individual optimizations available if both the 
caller and callee support them.



In case you want to know my thoughts or details, let me indulge in some 
detailed comparisons and commentary that led to this.
I also give a more detailed proposal below.
Keep in mind I wrote this before I distilled it to the paragraph above, 
and though the distillation is written as a diff to PEP 590, I still 
think of this as merging both PEPs.


PEP 580 tries hard to work with existing call conventions (like METH_O, 
METH_VARARGS), making them fast.
PEP 590 just defines a new convention. Basically, any callable that 
wants performance improvements must switch to METH_VECTORCALL (fastcall).
I believe PEP 590's approach is OK. To stay as performant as possible, C 
extension authors will need to adapt their code regularly. If they 
don't, no harm -- the code will still work as before, and will still be 
about as fast as it was before.
In exchange for this, Python (and Cython, etc.) can focus on optimizing 
one calling convention, rather than a variety, each with its own 
advantages and drawbacks.

Extending PEP 580 to support a new calling convention will involve 
defining a new CCALL_* constant, and adding to existing dispatch code.
Extending PEP 590 to support a new calling convention will most likely 
require a new type flag, and either changing the vectorcall semantics or 
adding a new pointer.
To be a bit more concrete, I think of possible extensions to PEP 590 as 
things like:
- Accepting a kwarg dict directly, without copying the items to 
tuple/array (as in PEP 580's CCALL_VARARGS|CCALL_KEYWORDS)
- Prepending more than one positional argument, or appending positional 
arguments
- When an optimization like LOAD_METHOD/CALL_METHOD turns out to no 
longer be relevant, removing it to simplify/speed up code.
I expect we'll later find out that something along these lines might 
improve performance. PEP 590 would make it hard to experiment.

I mentally split PEP 590 into two pieces: formalizing fastcall, plus one 
major "extension" -- making bound methods fast.
When seen this way, this "extension" is quite heavy: it adds an 
additional type flag, Py_TPFLAGS_METHOD_DESCRIPTOR, and uses a bit in 
the "Py_ssize_t nargs" argument as additional flag. Both type flags and 
nargs bits are very limited resources. If I was sure vectorcall is the 
final best implementation we'll have, I'd go and approve it ? but I 
think we still need room for experimentation, in the form of more such 
extensions.
PEP 580, with its collection of per-instance data and flags, is 
definitely more extensible. What I don't like about it is that it has 
the extensions built-in; mandatory for all callers/callees.

PEP 580 adds a common data struct to callable instances. Currently these 
are all data bound methods want to use (cc_flags, cc_func, cc_parent, 
cr_self). Various flags are consulted in order to deliver the needed 
info to the underlying function.
PEP 590 lets the callable object store data it needs independently. It 
provides a clever mechanism for pre-allocating space for bound methods' 
prepended "self" argument, so data can be provided cheaply, though it's 
still done by the callable itself.
Callables that would need to e.g. prepend more than one argument won't 
be able to use this mechanism, but y'all convinced me that is not worth 
optimizing for.

PEP 580's goal seems to be that making a callable behave like a Python 
function/method is just a matter of the right set of flags. Jeroen 
called this "complexity in the protocol".
PEP 590, on the other hand, leaves much to individual callable types. 
This is "complexity in the users of the protocol".
I now don't see a problem with PEP 590's approach. Not all users will 
need the complexity. We need to give CPython and Cython the tools to 
make implementing "def"-like functions possible (and fast), but if other 
extensions need to match the behavior of Python functions, they should 
just use Cython. Emulating Python functions is a special-enough use case 
that it doesn't justify complicating the protocol, and the same goes for 
implementing Python's built-in functions (with all their historical 
baggage).



My more full proposal for a compromise between PEP 580 and 590 would go 
something like below.

The type flag (Py_TPFLAGS_HAVE_VECTORCALL/Py_TPFLAGS_HAVE_CCALL) and 
offset (tp_vectorcall_offset/tp_ccalloffset; in tp_print's place) stay.

The offset identifies a per-instance structure with two fields:
- Function pointer (with the vectorcall signature)
- Flags
Storing any other per-instance data (like PEP 580's cr_self/cc_parent) 
is the responsibility of each callable type.

Two flags are defined initially:
1. "Method-like" (like Py_TPFLAGS_METHOD_DESCRIPTOR in PEP 580, or 
non-NULL cr_self in PEP 580). Having the flag here instead of a type 
flag will prevent tp_call-only callables from taking advantage of 
LOAD_METHOD/CALL_METHOD optimisation, but I think that's OK.

2. Request to reserve space for one argument before the args array, as 
in PEP 590's argument offsetting. If the flag is missing, nargs may not 
include PY_VECTORCALL_ARGUMENTS_OFFSET. A mechanism incompatible with 
offsetting may use the bit for another purpose.

Both flags may be simply ignored by the caller (or not be set by the 
callee in the first place), reverting to a more straightforward (but 
less performant) code path. This should also be the case for any flags 
added in the future.
Note how without these flags, the protocol (and its documentation) will 
be extremely simple.
This mechanism would work with my examples of possible future extensions:
- "kwarg dict": A flag would enable the `kwnames` argument to be a dict 
instead of a tuple.
- prepending/appending several positional arguments: The callable's 
request for how much space to allocate stored right after the {func; 
flags} struct. As in argument offsetting, a bit in nargs would indicate 
that the request was honored. (If this was made incompatible with 
one-arg offsetting, it could reuse the bit.)
- removing an optimization: CPython would simply stop using an 
optimizations (but not remove the flag). Extensions could continue to 
use the optimization between themselves.

As in PEP 590, any class that uses this mechanism shall not be usable as 
a base class. This will simplify implementation and tests, but hopefully 
the limitation will be removed in the future. (Maybe even in the initial 
implementation.)

The METH_VECTORCALL (aka CCALL_FASTCALL|CCALL_KEYWORDS) calling 
convention is added to the public API. The other calling conventions 
(PEP 580's CCALL_O, CCALL_NOARGS, CCALL_VARARGS, CCALL_KEYWORDS, 
CCALL_FASTCALL, CCALL_DEFARG) as well as argument type checking 
(CCALL_OBJCLASS) and self slicing (CCALL_SELFARG) are left up to the 
callable.

No equivalent of PEP 580's restrictions on the __name__ attribute. In my 
opinion, the PyEval_GetFuncName function should just be deprecated in 
favor of getting the __name__ attribute and checking if it's a string. 
It would be possible to add a public helper that returns a proper 
reference, but that doesn't seem worth it. Either way, I consider this 
out of scope of this PEP.

No equivalent of PEP 580's PyCCall_GenericGetParent and 
PyCCall_GenericGetQualname either -- again, if needed, they should be 
retrieved as normal attributes. As I see it, the operation doesn't need 
to be particularly fast.

No equivalent of PEP 580's PyCCall_Call, and no support for dict in 
PyCCall_FastCall's kwds argument. To be fast, extensions should avoid 
passing kwargs in a dict. Let's see how far that takes us. (FWIW, this 
also avoids subtle issues with dict mutability.)

Profiling stays as in PEP 580: only exact function types generate the 
events.

As in PEP 580, PyCFunction_GetFlags and PyCFunction_GET_FLAGS are deprecated

As in PEP 580, nothing is added to the stable ABI


Does that sound reasonable?