[Python-Dev] PEP 580 and PEP 590 comparison.
Thanks for spending time on this.
I think the comparison of the two PEPs falls into two broad categories,
performance and capability.
I'll address capability first.
Let's try a thought experiment.
Consider PEP 580. It uses the old `tp_print` slot as an offset to mark
the location of the CCall structure within the callable. Now suppose
instead that it uses a `tp_flag` to mark the presence of an offset field
and that the offset field is moved to the end of the TypeObject. This
would not impact the capabilities of PEP 580.
Now add a single line
nargs ~= PY_VECTORCALL_ARGUMENTS_OFFSET
which would make PyCCall_FastCall compatible with the PEP 590 vectorcall
Now rebase the PEP 580 reference code on top of PEP 590 minimal
implementation and make the vectorcall field of CFunction point to
The resulting hybrid is both a PEP 590 conformant implementation, and is
at least as capable as the reference PEP 580 implementation.
Therefore PEP 590, must be at least as capable at PEP 580.
Currently the PEP 590 implementation is intentionally minimal. It does
nothing for performance. The benchmark Jeroen provides is a
micro-benchmark that calls the same functions repeatedly. This is
trivial and unrealistic. So, there is no real evidence either way. I
will try to provide some.
The point of PEP 590 is that it allows performance improvements by
allowing callables more freedom of implementation. To repeat an example
from an earlier email, which may have been overlooked, this code reduces
the time to create ranges and small lists by about 30%
To speed up calls to builtin functions by a measurable amount will need
some work on argument clinic. I plan to have that done before PyCon in May.