[Python-Dev] Fuzzing the Python standard library
I have been fuzzing various parts of Python standard library for Python 3.7 with python-afl to find out internal implementation issues that exist in the library. What I have been looking for are mainly following:
* Exceptions that are something else than the documented ones. These usually indicate an internal implementation issue. For example one would not expect an UnicodeDecodeError from netrc.netrc() function when the documentation promises netrc.NetrcParseError and there is no way to pass properly sanitized file object to the netrc.netrc().
* Differences between values returned by C and Python versions of some functions. quopri module may have these.
* Unexpected performance and memory allocation issues. These can be somewhat controversial to fix, if at all, but at least in some cases from end-user perspective it can be really nasty if for example fractions.Fraction("1.64E6646466664") results in hundreds of megabytes of memory allocated and takes very long to calculate. I gave up waiting for that function call to finish after 5 minutes.
As this is going to result in a decent amount of bug reports (currently I only filed one, although that audio processing area has much more issues to file), I would like to ask your opinion on filing these bug reports. Should I report all issues regarding some specific module in one bug report, or try to further split them into more fine grained reports that may be related? These different types of errors are specifically noticeable in zipfile module that includes a lot of different exception and behavioral types on invalid data <https://github.com/Barro/python-stdlib-fuzzers/tree/master/zipfile/crashes> . And in case of sndhdr module, there are multiple modules with issues (aifc, sunau, wave) that then show up also in sndhdr when they are used. Or are some of you willing to go through the crashes that pop up and help with the report filing?
The code and more verbose description for this is available from <https://github.com/Barro/python-stdlib-fuzzers>. It works by default on some GNU/Linux systems only (I use Debian testing), as it relies on /dev/shm/ being available and uses shell scripts as wrappers that rely on various tools that may not be installed on all systems by default.
As a bonus, as this uses coverage based fuzzing, it also opens up the possibility of automatically creating a regression test suite for each of the fuzzed modules to ensure that the existing functionality (input files under <fuzz-target>/corpus/ directory) does not suddenly result in additional exceptions and that it is more easy to test potential bug fixes (crash inducing files under <fuzz-target>/crashes/ directory).
As a downside, this uses two quite specific tools (afl, python-afl) that have further dependencies (Cython) inside them, I doubt the viability of integrating this type of testing as part of normal Python verification process. As a difference to libFuzzer based fuzzing that is already integrated in Python, this instruments the actual (and only the) Python code and not the actions that the interpreter does in the background. So this should result in better fuzzer coverage for Python code that is used with the downside that when C functions are called, they are complete black boxes to the fuzzer.
I have mainly run these fuzzer instances at most for several hours per module with 4 instances and stopped running no-issue modules after there have been no new coverage discovered after more than 10 minutes. Also I have not really created high quality initial input files, so I wouldn't be surprised if there are more issues lurking around that could be found with throwing more CPU and higher quality fuzzers at the problem.