osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] PEP 578: Python Runtime Audit Hooks


The implementation can be viewed as a PR at 
https://github.com/python/cpython/pull/12613

On 28Mar2019 1535, Steve Dower wrote:
> Hi all
> 
> Time is short, but I'm hoping to get PEP 578 (formerly PEP 551) into 
> Python 3.8. Here's the current text for review and comment before I 
> submit to the Steering Council.
> 
> The formatted text is at https://www.python.org/dev/peps/pep-0578/ 
> (update just pushed, so give it an hour or so, but it's fundamentally 
> the same as what's there)
> 
> No Discourse post, because we don't have a python-dev equivalent there 
> yet, so please reply here for this one.
> 
> Implementation is at https://github.com/zooba/cpython/tree/pep-578/ and 
> my backport to 3.7 (https://github.com/zooba/cpython/tree/pep-578-3.7/) 
> is already getting some real use (though this will not be added to 3.7, 
> unless people *really* want it, so the backport is just for reference).
> 
> Cheers,
> Steve
> 
> =====
> 
> PEP: 578
> Title: Python Runtime Audit Hooks
> Version: $Revision$
> Last-Modified: $Date$
> Author: Steve Dower <steve.dower at python.org>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 16-Jun-2018
> Python-Version: 3.8
> Post-History:
> 
> Abstract
> ========
> 
> This PEP describes additions to the Python API and specific behaviors
> for the CPython implementation that make actions taken by the Python
> runtime visible to auditing tools. Visibility into these actions
> provides opportunities for test frameworks, logging frameworks, and
> security tools to monitor and optionally limit actions taken by the
> runtime.
> 
> This PEP proposes adding two APIs to provide insights into a running
> Python application: one for arbitrary events, and another specific to
> the module import system. The APIs are intended to be available in all
> Python implementations, though the specific messages and values used
> are unspecified here to allow implementations the freedom to determine
> how best to provide information to their users. Some examples likely
> to be used in CPython are provided for explanatory purposes.
> 
> See PEP 551 for discussion and recommendations on enhancing the
> security of a Python runtime making use of these auditing APIs.
> 
> Background
> ==========
> 
> Python provides access to a wide range of low-level functionality on
> many common operating systems. While this is incredibly useful for
> "write-once, run-anywhere" scripting, it also makes monitoring of
> software written in Python difficult. Because Python uses native system
> APIs directly, existing monitoring tools either suffer from limited
> context or auditing bypass.
> 
> Limited context occurs when system monitoring can report that an
> action occurred, but cannot explain the sequence of events leading to
> it. For example, network monitoring at the OS level may be able to
> report "listening started on port 5678", but may not be able to
> provide the process ID, command line, parent process, or the local
> state in the program at the point that triggered the action. Firewall
> controls to prevent such an action are similarly limited, typically
> to process names or some global state such as the current user, and
> in any case rarely provide a useful log file correlated with other
> application messages.
> 
> Auditing bypass can occur when the typical system tool used for an
> action would ordinarily report its use, but accessing the APIs via
> Python do not trigger this. For example, invoking "curl" to make HTTP
> requests may be specifically monitored in an audited system, but
> Python's "urlretrieve" function is not.
> 
> Within a long-running Python application, particularly one that
> processes user-provided information such as a web app, there is a risk
> of unexpected behavior. This may be due to bugs in the code, or
> deliberately induced by a malicious user. In both cases, normal
> application logging may be bypassed resulting in no indication that
> anything out of the ordinary has occurred.
> 
> Additionally, and somewhat unique to Python, it is very easy to affect
> the code that is run in an application by manipulating either the
> import system's search path or placing files earlier on the path than
> intended. This is often seen when developers create a script with the
> same name as the module they intend to use - for example, a
> ``random.py`` file that attempts to import the standard library
> ``random`` module.
> 
> This is not sandboxing, as this proposal does not attempt to prevent
> malicious behavior (though it enables some new options to do so).
> See the `Why Not A Sandbox`_ section below for further discussion.
> 
> Overview of Changes
> ===================
> 
> The aim of these changes is to enable both application developers and
> system administrators to integrate Python into their existing
> monitoring systems without dictating how those systems look or behave.
> 
> We propose two API changes to enable this: an Audit Hook?and Verified
> Open Hook. Both are available from Python and native code, allowing
> applications and frameworks written in pure Python code to take
> advantage of the extra messages, while also allowing embedders or
> system administrators to deploy builds of Python where auditing is
> always enabled.
> 
> Only CPython is bound to provide the native APIs as described here.
> Other implementations should provide the pure Python APIs, and
> may provide native versions as appropriate for their underlying
> runtimes. Auditing events are likewise considered implementation
> specific, but are bound by normal feature compatibility guarantees.
> 
> Audit Hook
> ----------
> 
> In order to observe actions taken by the runtime (on behalf of the
> caller), an API is required to raise messages from within certain
> operations. These operations are typically deep within the Python
> runtime or standard library, such as dynamic code compilation, module
> imports, DNS resolution, or use of certain modules such as ``ctypes``.
> 
> The following new C APIs allow embedders and CPython implementors to
> send and receive audit hook messages::
> 
>  ?? # Add an auditing hook
>  ?? typedef int (*hook_func)(const char *event, PyObject *args,
>  ??????????????????????????? void *userData);
>  ?? int PySys_AddAuditHook(hook_func hook, void *userData);
> 
>  ?? # Raise an event with all auditing hooks
>  ?? int PySys_Audit(const char *event, PyObject *args);
> 
>  ?? # Internal API used during Py_Finalize() - not publicly accessible
>  ?? void _Py_ClearAuditHooks(void);
> 
> The new Python APIs for receiving and raising audit hooks are::
> 
>  ?? # Add an auditing hook
>  ?? sys.addaudithook(hook: Callable[[str, tuple]])
> 
>  ?? # Raise an event with all auditing hooks
>  ?? sys.audit(str, *args)
> 
> 
> Hooks are added by calling ``PySys_AddAuditHook()`` from C at any time,
> including before ``Py_Initialize()``, or by calling
> ``sys.addaudithook()`` from Python code. Hooks cannot be removed or
> replaced.
> 
> When events of interest are occurring, code can either call
> ``PySys_Audit()`` from C (while the GIL is held) or ``sys.audit()``. The
> string argument is the name of the event, and the tuple contains
> arguments. A given event name should have a fixed schema for arguments,
> which should be considered a public API (for each x.y version release),
> and thus should only change between feature releases with updated
> documentation.
> 
> For maximum compatibility, events using the same name as an event in
> the reference interpreter CPython should make every attempt to use
> compatible arguments. Including the name or an abbreviation of the
> implementation in implementation-specific event names will also help
> prevent collisions. For example, a ``pypy.jit_invoked`` event is clearly
> distinguised from an ``ipy.jit_invoked`` event.
> 
> When an event is audited, each hook is called in the order it was added
> with the event name and tuple. If any hook returns with an exception
> set, later hooks are ignored and *in general* the Python runtime should
> terminate. This is intentional to allow hook implementations to decide
> how to respond to any particular event. The typical responses will be to
> log the event, abort the operation with an exception, or to immediately
> terminate the process with an operating system exit call.
> 
> When an event is audited but no hooks have been set, the ``audit()``
> function should impose minimal overhead. Ideally, each argument is a
> reference to existing data?rather than a value calculated just for the
> auditing call.
> 
> As hooks may be Python objects, they need to be freed during
> ``Py_Finalize()``. To do this, we add an internal API
> ``_Py_ClearAuditHooks()`` that releases any Python hooks and any
> memory held. This is an internal function with no public export, and
> we recommend it raise its own audit event for all current hooks to
> ensure that unexpected calls are observed.
> 
> Below in `Suggested Audit Hook Locations`_, we recommend some important
> operations that should raise audit events.
> 
> Python implementations should document which operations will raise
> audit events, along with the event schema. It is intentional that
> ``sys.addaudithook(print)`` be a trivial way to display all messages.
> 
> Verified Open Hook
> ------------------
> 
> Most operating systems have a mechanism to distinguish between files
> that can be executed and those that can not. For example, this may be an
> execute bit in the permissions field, a verified hash of the file
> contents to detect potential code tampering, or file system path
> restrictions. These are an important security mechanism for preventing
> execution of data or code that is not approved for a given environment.
> Currently, Python has no way to integrate with these when launching
> scripts or importing modules.
> 
> The new public C API for the verified open hook is::
> 
>  ?? # Set the handler
>  ?? typedef PyObject *(*hook_func)(PyObject *path, void *userData)
>  ?? int PyImport_SetOpenForImportHook(hook_func handler, void *userData)
> 
>  ?? # Open a file using the handler
>  ?? PyObject *PyImport_OpenForImport(const char *path)
> 
> The new public Python API for the verified open hook is::
> 
>  ?? # Open a file using the handler
>  ?? importlib.util.open_for_import(path : str) -> io.IOBase
> 
> 
> The ``importlib.util.open_for_import()`` function is a drop-in
> replacement?for ``open(str(pathlike), 'rb')``. Its default behaviour is
> to open a file for raw, binary access. To change the behaviour a new
> handler should be set. Handler functions only accept ``str`` arguments.
> The C API ``PyImport_OpenForImport`` function assumes UTF-8 encoding.
> 
> A custom handler may be set by calling ``PyImport_SetOpenForImportHook()``
> from C at any time, including before ``Py_Initialize()``. However, if a
> hook has already been set then the call will fail. When
> ``open_for_import()`` is called with a hook set, the hook will be passed
> the path and its return value will be returned directly. The returned
> object should be an open file-like object that supports reading raw
> bytes. This is explicitly intended to allow a ``BytesIO`` instance if
> the open handler has already read the entire file into memory.
> 
> Note that these hooks can import and call the ``_io.open()`` function on
> CPython without triggering themselves. They can also use ``_io.BytesIO``
> to return a compatible result using an in-memory buffer.
> 
> If the hook determines that the file should not be loaded, it should
> raise an exception of its choice, as well as performing any other
> logging.
> 
> All import and execution functionality involving code from a file will
> be changed to use ``open_for_import()`` unconditionally. It is important
> to note that calls to ``compile()``, ``exec()`` and ``eval()`` do not go
> through this function - an audit hook that includes the code from these
> calls is the best opportunity to validate code that is read from the
> file. Given the current decoupling between import and execution in
> Python, most imported code will go through both ``open_for_import()``
> and the log hook for ``compile``, and so care should be taken to avoid
> repeating verification steps.
> 
> There is no Python API provided for changing the open hook. To modify
> import behavior from Python code, use the existing functionality
> provided by ``importlib``.
> 
> API Availability
> ----------------
> 
> While all the functions added here are considered public and stable API,
> the behavior of the functions is implementation specific. Most
> descriptions here refer?to the CPython?implementation, and while other
> implementations should provide the functions, there is no requirement
> that they behave the same.
> 
> For example, ``sys.addaudithook()`` and ``sys.audit()`` should exist but
> may do nothing. This allows code to make calls to ``sys.audit()``
> without having to test for existence, but it should not assume that its
> call will have any effect. (Including existence tests in
> security-critical code allows another vector to bypass auditing, so it
> is preferable that the function always exist.)
> 
> ``importlib.util.open_for_import(path)`` should at a minimum always
> return ``_io.open(path, 'rb')``. Code using the function should make no
> further assumptions about what may occur, and implementations other than
> CPython are not required to let developers override the behavior of this
> function with a hook.
> 
> Suggested Audit Hook Locations
> ==============================
> 
> The locations and parameters in calls to ``sys.audit()`` or
> ``PySys_Audit()`` are to be determined by individual Python
> implementations. This is to allow maximum freedom for implementations
> to expose the operations that are most relevant to their platform,
> and to avoid or ignore potentially expensive or noisy events.
> 
> Table 1 acts as both suggestions of operations that should trigger
> audit events on all implementations, and examples of event schemas.
> 
> Table 2 provides further examples that are not required, but are
> likely to be available in CPython.
> 
> Refer to the documentation associated with your version of Python to
> see which operations provide audit events.
> 
> .. csv-table:: Table 1: Suggested Audit Hooks
>  ?? :header: "API Function", "Event Name", "Arguments", "Rationale"
>  ?? :widths: 2, 2, 3, 6
> 
>  ?? ``PySys_AddAuditHook``, ``sys.addaudithook``, "", "Detect when new
>  ?? audit hooks are being added.
>  ?? "
>  ?? ``PyImport_SetOpenForImportHook``, ``setopenforimporthook``, "", "
>  ?? Detects any attempt to set the ``open_for_import`` hook.
>  ?? "
>  ?? "``compile``, ``exec``, ``eval``, ``PyAst_CompileString``,
>  ?? ``PyAST_obj2mod``", ``compile``, "``(code, filename_or_none)``", "
>  ?? Detect dynamic code compilation, where ``code`` could be a string or
>  ?? AST. Note that this will be called for regular imports of source
>  ?? code, including those that were opened with ``open_for_import``.
>  ?? "
>  ?? "``exec``, ``eval``, ``run_mod``", ``exec``, "``(code_object,)``", "
>  ?? Detect dynamic execution of code objects. This only occurs for
>  ?? explicit calls, and is not raised for normal function invocation.
>  ?? "
>  ?? ``import``, ``import``, "``(module, filename, sys.path,
>  ?? sys.meta_path, sys.path_hooks)``", "Detect when modules are
>  ?? imported. This is raised before the module name is resolved to a
>  ?? file. All arguments other than the module name may be ``None`` if
>  ?? they are not used or available.
>  ?? "
>  ?? "``open``", ``open``, "``(path, mode, flags)``", "Detect when a file
>  ?? is about to be opened. *path* and *mode* are the usual parameters to
>  ?? ``open`` if available, while *flags* is provided instead of *mode*
>  ?? in some cases.
>  ?? "
>  ?? ``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is
>  ?? injecting trace functions. Because of the implementation, exceptions
>  ?? raised from the hook will abort the operation, but will not be
>  ?? raised in Python code. Note that ``threading.setprofile`` eventually
>  ?? calls this function, so the event will be audited for each thread.
>  ?? "
>  ?? ``PyEval_SetTrace``, ``sys.settrace``, "", "Detect when code is
>  ?? injecting trace functions. Because of the implementation, exceptions
>  ?? raised from the hook will abort the operation, but will not be
>  ?? raised in Python code. Note that ``threading.settrace`` eventually
>  ?? calls this function, so the event will be audited for each thread.
>  ?? "
>  ?? "``_PyObject_GenericSetAttr``, ``check_set_special_type_attr``,
>  ?? ``object_set_class``, ``func_set_code``, ``func_set_[kw]defaults``","
>  ?? ``object.__setattr__``","``(object, attr, value)``","Detect monkey
>  ?? patching of types and objects. This event
>  ?? is raised for the ``__class__`` attribute and any attribute on
>  ?? ``type`` objects.
>  ?? "
>  ?? "``_PyObject_GenericSetAttr``",``object.__delattr__``,"``(object,
>  ?? attr)``","Detect deletion of object attributes. This event is raised
>  ?? for any attribute on ``type`` objects.
>  ?? "
>  ?? "``Unpickler.find_class``",``pickle.find_class``,"``(module_name,
>  ?? global_name)``","Detect imports and global name lookup when
>  ?? unpickling.
>  ?? "
> 
> 
> .. csv-table:: Table 2: Potential CPython Audit Hooks
>  ?? :header: "API Function", "Event Name", "Arguments", "Rationale"
>  ?? :widths: 2, 2, 3, 6
> 
>  ?? ``_PySys_ClearAuditHooks``, ``sys._clearaudithooks``, "", "Notifies
>  ?? hooks they are being cleaned up, mainly in case the event is
>  ?? triggered unexpectedly. This event cannot be aborted.
>  ?? "
>  ?? ``code_new``, ``code.__new__``, "``(bytecode, filename, name)``", "
>  ?? Detect dynamic creation of code objects. This only occurs for
>  ?? direct instantiation, and is not raised for normal compilation.
>  ?? "
>  ?? ``func_new_impl``, ``function.__new__``, "``(code,)``", "Detect
>  ?? dynamic creation of function objects. This only occurs for direct
>  ?? instantiation, and is not raised for normal compilation.
>  ?? "
>  ?? "``_ctypes.dlopen``, ``_ctypes.LoadLibrary``", ``ctypes.dlopen``, "
>  ?? ``(module_or_path,)``", "Detect when native modules are used.
>  ?? "
>  ?? ``_ctypes._FuncPtr``, ``ctypes.dlsym``, "``(lib_object, name)``", "
>  ?? Collect information about specific symbols retrieved from native
>  ?? modules.
>  ?? "
>  ?? ``_ctypes._CData``, ``ctypes.cdata``, "``(ptr_as_int,)``", "Detect
>  ?? when code is accessing arbitrary memory using ``ctypes``.
>  ?? "
>  ?? "``new_mmap_object``",``mmap.__new__``,"``(fileno, map_size, access,
>  ?? offset)``", "Detects creation of mmap objects. On POSIX, access may
>  ?? have been calculated from the ``prot`` and ``flags`` arguments.
>  ?? "
>  ?? ``sys._getframe``, ``sys._getframe``, "``(frame_object,)``", "Detect
>  ?? when code is accessing frames directly.
>  ?? "
>  ?? ``sys._current_frames``, ``sys._current_frames``, "", "Detect when
>  ?? code is accessing frames directly.
>  ?? "
>  ?? "``socket.bind``, ``socket.connect``, ``socket.connect_ex``,
>  ?? ``socket.getaddrinfo``, ``socket.getnameinfo``, ``socket.sendmsg``,
>  ?? ``socket.sendto``", ``socket.address``, "``(address,)``", "Detect
>  ?? access to network resources. The address is unmodified from the
>  ?? original call.
>  ?? "
>  ?? "``member_get``, ``func_get_code``, ``func_get_[kw]defaults``
>  ?? ",``object.__getattr__``,"``(object, attr)``","Detect access to
>  ?? restricted attributes. This event is raised for any built-in
>  ?? members that are marked as restricted, and members that may allow
>  ?? bypassing imports.
>  ?? "
>  ?? "``urllib.urlopen``",``urllib.Request``,"``(url, data, headers,
>  ?? method)``", "Detects URL requests.
>  ?? "
> 
> Performance Impact
> ==================
> 
> The important performance impact is the case where events are being
> raised but there are no hooks attached. This is the unavoidable case -
> once a developer has added audit hooks they have explicitly chosen to
> trade performance for functionality. Performance impact with hooks added
> are not of interest here, since this is opt-in functionality.
> 
> Analysis using the Python Performance Benchmark Suite [1]_ shows no
> significant impact, with the vast majority of benchmarks showing
> between 1.05x faster to 1.05x slower.
> 
> In our opinion, the performance impact of the set of auditing points
> described in this PEP is negligible.
> 
> Rejected Ideas
> ==============
> 
> Separate module for audit hooks
> -------------------------------
> 
> The proposal is to add a new module for audit hooks, hypothetically
> ``audit``. This would separate the API and implementation from the
> ``sys`` module, and allow naming the C functions ``PyAudit_AddHook`` and
> ``PyAudit_Audit`` rather than the current variations.
> 
> Any such module would need to be a built-in module that is guaranteed to
> always be present. The nature of these hooks is that they must be
> callable without condition, as any conditional imports or calls provide
> opportunities to intercept and suppress or modify events.
> 
> Given it is one of the most core modules, the ``sys`` module is somewhat
> protected against module shadowing attacks. Replacing ``sys`` with a
> sufficiently functional module that the application can still run is a
> much more complicated task than replacing a module with only one
> function of interest. An attacker that has the ability to shadow the
> ``sys`` module is already capable of running arbitrary code from files,
> whereas an ``audit`` module could be replaced with a single line in a
> ``.pth`` file anywhere on the search path::
> 
>  ??? import sys; sys.modules['audit'] = type('audit', (object,),
>  ??????? {'audit': lambda *a: None, 'addhook': lambda *a: None})
> 
> Multiple layers of protection already exist for monkey patching attacks
> against either ``sys`` or ``audit``, but assignments or insertions to
> ``sys.modules`` are not audited.
> 
> This idea is rejected because it makes it trivial to suppress all calls
> to ``audit``.
> 
> Flag in sys.flags to indicate "audited" mode
> --------------------------------------------
> 
> The proposal is to add a value in ``sys.flags`` to indicate when Python
> is running in a "secure" or "audited" mode. This would allow
> applications to detect when some features are enabled or when hooks
> have been added and modify their behaviour appropriately.
> 
> Currently, we are not aware of any legitimate reasons for a program to
> behave differently in the presence of audit hooks.
> 
> Both application-level APIs ``sys.audit`` and
> ``importlib.util.open_for_import`` are always present and functional,
> regardless of whether the regular ``python`` entry point or some
> alternative entry point is used. Callers cannot determine whether any
> hooks have been added (except by performing side-channel analysis), nor
> do they need to. The calls should be fast enough that callers do not
> need to avoid them, and the program is responsible for ensuring that
> any added hooks are fast enough to not affect application performance.
> 
> The argument that this is "security by obscurity" is valid, but
> irrelevant. Security by obscurity is only an issue when there are no
> other protective mechanisms; obscurity as the first step in avoiding
> attack is strongly recommended (see `this article
> <https://danielmiessler.com/study/security-by-obscurity/>`_ for
> discussion).
> 
> This idea is rejected because there are no appropriate reasons for an
> application to change its behaviour based on whether these APIs are in
> use.
> 
> Why Not A Sandbox
> =================
> 
> Sandboxing CPython has been attempted many times in the past, and each
> past attempt has failed. Fundamentally, the problem is that certain
> functionality has to be restricted when executing the sandboxed code,
> but otherwise needs to be available for normal operation of Python. For
> example, completely removing the ability to compile strings into
> bytecode also breaks the ability to import modules from source code, and
> if it is not completely removed then there are too many ways to get
> access to that functionality indirectly. There is not yet any feasible
> way to generically determine whether a given operation is "safe" or not.
> Further information and references available at [2]_.
> 
> This proposal does not attempt to restrict functionality, but simply
> exposes the fact that the functionality is being used. Particularly for
> intrusion scenarios, detection is significantly more important than
> early prevention (as early prevention will generally drive attackers to
> use an alternate, less-detectable, approach). The availability of audit
> hooks alone does not change the attack surface of Python in any way, but
> they enable defenders to integrate Python into their environment in ways
> that are currently not possible.
> 
> Since audit hooks have the ability to safely prevent an operation
> occuring, this feature does enable the ability to provide some level of
> sandboxing. In most cases, however, the intention is to enable logging
> rather than creating a sandbox.
> 
> Relationship to PEP 551
> =======================
> 
> This API was originally presented as part of
> `PEP 551 <https://www.python.org/dev/peps/pep-0551/>`_ Security
> Transparency in the Python Runtime.
> 
> For simpler review purposes, and due to the broader applicability of
> these APIs beyond security, the API design is now presented separately.
> 
> PEP 551 is an informational PEP discussing how to integrate Python into
> a secure or audited environment.
> 
> References
> ==========
> 
> .. [1] Python Performance Benchmark Suite 
> `<https://github.com/python/performance>`_
> 
> .. [2] Python Security model - Sandbox 
> `<https://python-security.readthedocs.io/security.html#sandbox>`_
> 
> Copyright
> =========
> 
> Copyright (c) 2019 by Microsoft Corporation. This material may be
> distributed only subject to the terms and conditions set forth in the
> Open Publication License, v1.0 or later (the latest version is presently
> available at http://www.opencontent.org/openpub/).