[Python-Dev] Fuzzing the Python standard library
Quick answer: undocumented billion laughs/exponential entity expansion type of an attack that is accessible through web through any library that uses fractions module to parse user input (that are actually available on Github). Could be mitigated by explicitly mentioning this in documentation, exposing a setting for engineering notation exponent limits, using non-naive way of storing numbers, or limiting the total size that a number representation can take by default to some limited, but large (for example 1 megabyte), value.
More details should be discussed in a bug report that what is the preferred mitigation approach in this case.
On Tue, Jul 17, 2018, at 20:26, Damian Shaw wrote:
> I'm not a core Python Dev, but quick question, why would you expect "
> fractions.Fraction("1.64E6646466664")" not to take 100s of megabytes and
> hours to run?
> Simply evaluating: 164 * 10**664646666 will take hundreds of megabytes by
> On Tue, Jul 17, 2018, 12:54 Jussi Judin <jjudin+python at iki.fi> wrote:
> > Hi,
> > I have been fuzzing various parts of Python standard library for Python
> > 3.7 with python-afl to find out internal implementation issues that
> > exist in the library. What I have been looking for are mainly following:
> > * Exceptions that are something else than the documented ones. These
> > usually indicate an internal implementation issue. For example one would
> > not expect an UnicodeDecodeError from netrc.netrc() function when the
> > documentation promises netrc.NetrcParseError and there is no way to pass
> > properly sanitized file object to the netrc.netrc().
> > * Differences between values returned by C and Python versions of some
> > functions. quopri module may have these.
> > * Unexpected performance and memory allocation issues. These can be
> > somewhat controversial to fix, if at all, but at least in some cases from
> > end-user perspective it can be really nasty if for example
> > fractions.Fraction("1.64E6646466664") results in hundreds of megabytes of
> > memory allocated and takes very long to calculate. I gave up waiting for
> > that function call to finish after 5 minutes.
> > As this is going to result in a decent amount of bug reports (currently I
> > only filed one, although that audio processing area has much more issues
> > to file), I would like to ask your opinion on filing these bug reports.
> > Should I report all issues regarding some specific module in one bug
> > report, or try to further split them into more fine grained reports that
> > may be related? These different types of errors are specifically noticeable
> > in zipfile module that includes a lot of different exception and behavioral
> > types on invalid data <
> > https://github.com/Barro/python-stdlib-fuzzers/tree/master/zipfile/crashes>
> > . And in case of sndhdr module, there are multiple modules with issues
> > (aifc, sunau, wave) that then show up also in sndhdr when they are used. Or
> > are some of you willing to go through the crashes that pop up and help with
> > the report filing?
> > The code and more verbose description for this is available from <
> > https://github.com/Barro/python-stdlib-fuzzers>. It works by default on
> > some GNU/Linux systems only (I use Debian testing), as it relies on
> > /dev/shm/ being available and uses shell scripts as wrappers that rely on
> > various tools that may not be installed on all systems by default.
> > As a bonus, as this uses coverage based fuzzing, it also opens up the
> > possibility of automatically creating a regression test suite for each of
> > the fuzzed modules to ensure that the existing functionality (input files
> > under <fuzz-target>/corpus/ directory) does not suddenly result in
> > additional exceptions and that it is more easy to test potential bug fixes
> > (crash inducing files under <fuzz-target>/crashes/ directory).
> > As a downside, this uses two quite specific tools (afl, python-afl) that
> > have further dependencies (Cython) inside them, I doubt the viability of
> > integrating this type of testing as part of normal Python verification
> > process. As a difference to libFuzzer based fuzzing that is already
> > integrated in Python, this instruments the actual (and only the) Python
> > code and not the actions that the interpreter does in the background. So
> > this should result in better fuzzer coverage for Python code that is used
> > with the downside that when C functions are called, they are complete black
> > boxes to the fuzzer.
> > I have mainly run these fuzzer instances at most for several hours per
> > module with 4 instances and stopped running no-issue modules after there
> > have been no new coverage discovered after more than 10 minutes. Also I
> > have not really created high quality initial input files, so I wouldn't be
> > surprised if there are more issues lurking around that could be found with
> > throwing more CPU and higher quality fuzzers at the problem.
> > : https://en.wikipedia.org/wiki/Fuzzing
> > : https://github.com/jwilk/python-afl
> > : https://docs.python.org/3/library/netrc.html
> > : https://bugs.python.org/issue34088
> > : https://github.com/python/cpython/tree/3.7/Modules/_xxtestfuzz
> > --
> > Jussi Judin
> > https://jjudin.iki.fi/
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> > https://mail.python.org/mailman/options/python-dev/damian.peter.shaw%40gmail.com