[Python-Dev] Benchmarks why we need PEP 576/579/580
I finally managed to get some real-life benchmarks for why we need a
faster C calling protocol (see PEPs 576, 579, 580).
I focused on the Cython compilation of SageMath. By default, a function
in Cython is an instance of builtin_function_or_method (analogously,
method_descriptor for a method), which has special optimizations in the
CPython interpreter. But the option "binding=True" changes those to a
custom class which is NOT optimized.
I ran the full SageMath testsuite several times without and with
binding=True to find out any significant differences. The most dramatic
difference is multiplication for generic matrices. More precisely, with
the following command:
python -m timeit -s "from sage.all import MatrixSpace, GF; M =
MatrixSpace(GF(9), 200).random_element()" "M * M"
With binding=False, I got
10 loops, best of 3: 692 msec per loop
With binding=True, I got
10 loops, best of 3: 1.16 sec per loop
This is a big regression which should be gone completely with PEP 580.
I should mention that this was done on Python 2.7.15 (SageMath is not
yet ported to Python 3) but I see no reason why the conclusions
shouldn't be valid for newer Python versions. I used SageMath 8.3.rc1
and Cython 0.28.4.
I hope that this finally shows that the problems mentioned in PEP 579