[Python-Dev] Benchmarks why we need PEP 576/579/580
I did exactly the same benchmark again with Python 3.7 and the results
are similar. I'm copying and editing the original post for completeness:
I finally managed to get some real-life benchmarks for why we need a
faster C calling protocol (see PEPs 576, 579, 580).
I focused on the Cython compilation of SageMath. By default, a function
in Cython is an instance of builtin_function_or_method (analogously,
method_descriptor for a method), which has special optimizations in the
CPython interpreter. But the option "binding=True" changes those to a
custom class which is NOT optimized.
I ran the full SageMath testsuite several times on Python 2.7 without
and with binding=True to find out any significant differences. I then
checked if those differences could be reproduced on Python 3.7 (SageMath
has not been fully ported to Python 3 yet). The most dramatic difference
is multiplication for generic matrices. More precisely, with the
python3 -m timeit -s "from sage.all import MatrixSpace, GF; M =
MatrixSpace(GF(9), 200).random_element()" "M * M"
With binding=False, I got
1 loop, best of 5: 1.19 sec per loop
With binding=True, I got
1 loop, best of 5: 1.83 sec per loop
This is a big regression which should be gone completely with PEP 580.
I used Python 3.7, SageMath 8.3.rc1 (plus a few patches to make it work
with binding=True and with Python 3.7) and Cython 0.28.4.
I hope that this finally shows that the problems mentioned in PEP 579