osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] PEP 410, 3rd revision, Decimal timestamp


PEP: 410
Title: Use decimal.Decimal type for timestamps
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner at haypocalc.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 01-February-2012
Python-Version: 3.3


Abstract
========

Decimal becomes the official type for high-resolution timestamps to
make Python support new functions using a nanosecond resolution
without loss of precision.


Motivation
==========

Python 2.3 introduced float timestamps to support sub-second
resolutions.  os.stat() uses float timestamps by default since Python
2.5. Python 3.3 introduced functions supporting nanosecond
resolutions:

 * os module: futimens(), utimensat()
 * time module: clock_gettime(), clock_getres(), monotonic(), wallclock()

os.stat() reads nanosecond timestamps but returns timestamps as float.

The Python float type uses binary64 format of the IEEE 754 standard.
With a resolution of one nanosecond (10\ :sup:`-9`), float timestamps
lose precision for values bigger than 2\ :sup:`24` seconds (194 days:
1970-07-14 for an Epoch timestamp).

Nanosecond resolution is required to set the exact modification time
on filesystems supporting nanosecond timestamps (e.g. ext4, btrfs,
NTFS, ...).  It helps also to compare the modification time to check
if a file is newer than another file. Use cases: copy the modification
time of a file using shutil.copystat(), create a TAR archive with the
tarfile module, manage a mailbox with the mailbox module, etc.

An arbitrary resolution is preferred over a fixed resolution (like
nanosecond) to not have to change the API when a better resolution is
required. For example, the NTP protocol uses fractions of 2\ :sup:`32`
seconds (approximatively 2.3 ? 10\ :sup:`-10` second), whereas the NTP
protocol version 4 uses fractions of 2\ :sup:`64` seconds (5.4 ? 10\
:sup:`-20` second).

.. note::
   With a resolution of 1 microsecond (10\ :sup:`-6`), float
timestamps lose precision for values bigger than 2\ :sup:`33` seconds
(272 years: 2242-03-16 for an Epoch timestamp). With a resolution of
100 nanoseconds (10\ :sup:`-7`, resolution used on Windows), float
timestamps lose precision for values bigger than 2\ :sup:`29` seconds
(17 years: 1987-01-05 for an Epoch timestamp).


Specification
=============

Add decimal.Decimal as a new type for timestamps. Decimal supports any
timestamp resolution, support arithmetic operations and is comparable.
It is possible to coerce a Decimal to float, even if the conversion
may lose precision. The clock resolution can also be stored in a
Decimal object.

Add an optional *timestamp* argument to:

 * os module: fstat(), fstatat(), lstat(), stat() (st_atime, st_ctime
and st_mtime fields of the stat structure), sched_rr_get_interval(),
times(), wait3() and wait4()
 * resource module: ru_utime and ru_stime fields of getrusage()
 * signal module: getitimer(), setitimer()
 * time module: clock(), clock_gettime(), clock_getres(), monotonic(),
time() and wallclock()

The *timestamp* argument value can be float or Decimal, float is still
the default for backward compatibility. The following functions
support Decimal as input:

 * datetime module: date.fromtimestamp(), datetime.fromtimestamp() and
datetime.utcfromtimestamp()
 * os module: futimes(), futimesat(), lutimes(), utime()
 * select module: epoll.poll(), kqueue.control(), select()
 * signal module: setitimer(), sigtimedwait()
 * time module: ctime(), gmtime(), localtime(), sleep()

The os.stat_float_times() function is deprecated: use an explicit cast
using int() instead.

.. note::
   The decimal module is implemented in Python and is slower than
float, but there is a new C implementation which is almost ready for
inclusion in CPython.


Backwards Compatibility
=======================

The default timestamp type is unchanged, so there is no impact on
backward compatibility nor on performances. The new timestamp type,
decimal.Decimal, is only returned when requested explicitly.


Objection: clocks accuracy
==========================

Computer clocks and operating systems are inaccurate and fail to
provide nanosecond accuracy in practice. A nanosecond is what it takes
to execute a couple of CPU instructions.  Even on a real-time
operating system, a nanosecond-precise measurement is already obsolete
when it starts being processed by the higher-level application. A
single cache miss in the CPU will make the precision worthless.

.. note::
   Linux *actually* is able to measure time in nanosecond precision,
even though it is not able to keep its clock synchronized to UTC with
a nanosecond accuracy.


Alternatives: Timestamp types
=============================

To support timestamps with an arbitrary or nanosecond resolution, the
following types have been considered:

 * number of nanoseconds
 * 128-bits float
 * decimal.Decimal
 * datetime.datetime
 * datetime.timedelta
 * tuple of integers
 * timespec structure

Criteria:

 * Doing arithmetic on timestamps must be possible
 * Timestamps must be comparable
 * An arbitrary resolution, or at least a resolution of one nanosecond
without losing precision
 * It should be possible to coerce the new timestamp to float for
backward compatibility


A resolution of one nanosecond is enough to support all current C functions.

The best resolution used by operating systems is one nanosecond. In
practice, most clock accuracy is closer to microseconds than
nanoseconds. So it sounds reasonable to use a fixed resolution of one
nanosecond.


Number of nanoseconds (int)
---------------------------

A nanosecond resolution is enough for all current C functions and so a
timestamp can simply be a number of nanoseconds, an integer, not a
float.

The number of nanoseconds format has been rejected because it would
require to add new specialized functions for this format because it
not possible to differentiate a number of nanoseconds and a number of
seconds just by checking the object type.


128-bits float
--------------

Add a new IEEE 754-2008 quad-precision binary float type. The IEEE
754-2008 quad precision float has 1 sign bit, 15 bits of exponent and
112 bits of mantissa.  128-bits float is supported by GCC (4.3), Clang
and ICC compilers.

Python must be portable and so cannot rely on a type only available on
some platforms. For example, Visual C++ 2008 doesn't support 128-bits
float, whereas it is used to build the official Windows executables.
Another example: GCC 4.3 does not support __float128 in 32-bit mode on
x86 (but GCC 4.4 does).

There is also a license issue: GCC uses the MPFR library for 128-bits
float, library distributed under the GNU LGPL license. This license is
not compatible with the Python license.

.. note::
  The x87 floating point unit of Intel CPU supports 80-bit floats.
This format is not supported by the SSE instruction set, which is now
preferred over float, especially on x86_64. Other CPU vendors don't
support 80-bit float.



datetime.datetime
-----------------

The datetime.datetime type is the natural choice for a timestamp
because it is clear that this type contains a timestamp, whereas int,
float and Decimal are raw numbers. It is an absolute timestamp and so
is well defined. It gives direct access to the year, month, day,
hours, minutes and seconds. It has methods related to time like
methods to format the timestamp as string (e.g.
datetime.datetime.strftime).

The major issue is that except os.stat(), time.time() and
time.clock_gettime(time.CLOCK_GETTIME), all time functions have an
unspecified starting point and no timezone information, and so cannot
be converted to datetime.datetime.

datetime.datetime has also issues with timezone. For example, a
datetime object without timezone (unaware) and a datetime with a
timezone (aware) cannot be compared. There is also an ordering issues
with daylight saving time (DST) in the duplicate hour of switching
from DST to normal time.

datetime.datetime has been rejected because it cannot be used for
functions using an unspecified starting point like os.times() or
time.clock().

For time.time() and time.clock_gettime(time.CLOCK_GETTIME): it is
already possible to get the current time as a datetime.datetime object
using::

    datetime.datetime.now(datetime.timezone.utc)

For os.stat(), it is simple to create a datetime.datetime object from
a decimal.Decimal timestamp in the UTC timezone::

    datetime.datetime.fromtimestamp(value, datetime.timezone.utc)

.. note::
   datetime.datetime only supports microsecond resolution, but can be
enhanced to support nanosecond.

datetime.timedelta
------------------

datetime.timedelta is the natural choice for a relative timestamp
because it is clear that this type contains a timestamp, whereas int,
float and Decimal are raw numbers. It can be used with
datetime.datetime to get an absolute timestamp when the starting point
is known.

datetime.timedelta has been rejected because it cannot be coerced to
float and has a fixed resolution. One new standard timestamp type is
enough, Decimal is preferred over datetime.timedelta. Converting a
datetime.timedelta to float requires an explicit call to the
datetime.timedelta.total_seconds() method.

.. note::
   datetime.timedelta only supports microsecond resolution, but can be
enhanced to support nanosecond.


.. _tuple:

Tuple of integers
-----------------

To expose C functions in Python, a tuple of integers is the natural
choice to store a timestamp because the C language uses structures
with integers fields (e.g. timeval and timespec structures). Using
only integers avoids the loss of precision (Python supports integers
of arbitrary length). Creating and parsing a tuple of integers is
simple and fast.

Depending of the exact format of the tuple, the precision can be
arbitrary or fixed. The precision can be choose as the loss of
precision is smaller than an arbitrary limit like one nanosecond.

Different formats has been proposed:

 * A: (numerator, denominator)

   * value = numerator / denominator
   * resolution = 1 / denominator
   * denominator > 0

 * B: (seconds, numerator, denominator)

   * value = seconds + numerator / denominator
   * resolution = 1 / denominator
   * 0 <= numerator < denominator
   * denominator > 0

 * C: (intpart, floatpart, base, exponent)

   * value = intpart + floatpart / base\ :sup:`exponent`
   * resolution = 1 / base \ :sup:`exponent`
   * 0 <= floatpart < base \ :sup:`exponent`
   * base > 0
   * exponent >= 0

 * D: (intpart, floatpart, exponent)

   * value = intpart + floatpart / 10\ :sup:`exponent`
   * resolution = 1 / 10 \ :sup:`exponent`
   * 0 <= floatpart < 10 \ :sup:`exponent`
   * exponent >= 0

 * E: (sec, nsec)

   * value = sec + nsec ? 10\ :sup:`-9`
   * resolution = 10 \ :sup:`-9` (nanosecond)
   * 0 <= nsec < 10 \ :sup:`9`

All formats support an arbitrary resolution, except of the format (E).

The format (D) may not be able to store the exact value (may loss of
precision) if the clock frequency is arbitrary and cannot be expressed
as a power of 10.  The format (C) has a similar issue, but in such
case, it is possible to use base=frequency and exponent=1.

The formats (C), (D) and (E) allow optimization for conversion to
float if the base is 2 and to decimal.Decimal if the base is 10.

The format (A) is a simple fraction. It supports arbitrary precision,
is simple (only two fields), only requires a simple division to get
the floating point value, and is already used by
float.as_integer_ratio().

To simplify the implementation (especially the C implementation to
avoid integer overflow), a numerator bigger than the denominator can
be accepted.  The tuple may be normalized later.

Tuple of integers have been rejected because they don't support
arithmetic operations.

.. note::
   On Windows, the ``QueryPerformanceCounter()`` clock uses the
frequency of the processor which is an arbitrary number and so may not
be a power or 2 or 10. The frequency can be read using
``QueryPerformanceFrequency()``.


timespec structure
------------------

timespec is the C structure used to store timestamp with a nanosecond
resolution. Python can use a type with the same structure: (seconds,
nanoseconds). For convenience, arithmetic operations on timespec are
supported.

Example of an incomplete timespec type supporting addition,
subtraction and coercion to float::

    class timespec(tuple):
        def __new__(cls, sec, nsec):
            if not isinstance(sec, int):
                raise TypeError
            if not isinstance(nsec, int):
                raise TypeError
            asec, nsec = divmod(nsec, 10 ** 9)
            sec += asec
            obj = tuple.__new__(cls, (sec, nsec))
            obj.sec = sec
            obj.nsec = nsec
            return obj

        def __float__(self):
            return self.sec + self.nsec * 1e-9

        def total_nanoseconds(self):
            return self.sec * 10 ** 9 + self.nsec

        def __add__(self, other):
            if not isinstance(other, timespec):
                raise TypeError
            ns_sum = self.total_nanoseconds() + other.total_nanoseconds()
            return timespec(*divmod(ns_sum, 10 ** 9))

        def __sub__(self, other):
            if not isinstance(other, timespec):
                raise TypeError
            ns_diff = self.total_nanoseconds() - other.total_nanoseconds()
            return timespec(*divmod(ns_diff, 10 ** 9))

        def __str__(self):
            if self.sec < 0 and self.nsec:
                sec = abs(1 + self.sec)
                nsec = 10**9 - self.nsec
                return '-%i.%09u' % (sec, nsec)
            else:
                return '%i.%09u' % (self.sec, self.nsec)

        def __repr__(self):
            return '<timespec(%s, %s)>' % (self.sec, self.nsec)

The timespec type is similar to the format (E) of tuples of integer,
except that it supports arithmetic and coercion to float.

The timespec type was rejected because it only supports nanosecond
resolution and requires to implement each arithmetic operation,
whereas the Decimal type is already implemented and well tested.


Alternatives: API design
========================

Add a string argument to specify the return type
------------------------------------------------

Add an string argument to function returning timestamps, example:
time.time(format="datetime"). A string is more extensible than a type:
it is possible to request a format that has no type, like a tuple of
integers.

This API was rejected because it was necessary to import implicitly
modules to instantiate objects (e.g. import datetime to create
datetime.datetime).  Importing a module may raise an exception and may
be slow, such behaviour is unexpected and surprising.


Add a global flag to change the timestamp type
----------------------------------------------

A global flag like os.stat_decimal_times(), similar to
os.stat_float_times(), can be added to set globally the timestamp
type.

A global flag may cause issues with libraries and applications
expecting float instead of Decimal. Decimal is not fully compatible
with float. float+Decimal raises a TypeError for example. The
os.stat_float_times() case is different because an int can be coerced
to float and int+float gives float.


Add a protocol to create a timestamp
------------------------------------

Instead of hard coding how timestamps are created, a new protocol can
be added to create a timestamp from a fraction.

For example, time.time(timestamp=type) would call the class method
type.__fromfraction__(numerator, denominator) to create a timestamp
object of the specified type. If the type doesn't support the
protocol, a fallback is used: type(numerator) / type(denominator).

A variant is to use a "converter" callback to create a timestamp.
Example creating a float timestamp:

    def timestamp_to_float(numerator, denominator):
        return float(numerator) / float(denominator)

Common converters can be provided by time, datetime and other modules,
or maybe a specific "hires" module. Users can define their own
converters.

Such protocol has a limitation: the timestamp structure has to be
decided once and cannot be changed later. For example, adding a
timezone or the absolute start of the timestamp would break the API.

The protocol proposition was as being excessive given the
requirements, but that the specific syntax proposed
(time.time(timestamp=type)) allows this to be introduced later if
compelling use cases are discovered.

.. note::
   Other formats may be used instead of a fraction: see the tuple of integers
   section for example.


Add new fields to os.stat
-------------------------

To get the creation, modification and access time of a file with a
nanosecond resolution, three fields can be added to os.stat()
structure.

The new fields can be timestamps with nanosecond resolution (e.g.
Decimal) or the nanosecond part of each timestamp (int).

If the new fields are timestamps with nanosecond resolution,
populating the extra fields would be time consuming. Any call to
os.stat() would be slower, even if os.stat() is only called to check
if a file exists. A parameter can be added to os.stat() to make these
fields optional, the structure would have a variable number of fields.

If the new fields only contain the fractional part (nanoseconds),
os.stat() would be efficient. These fields would always be present and
so set to zero if the operating system does not support sub-second
resolution. Splitting a timestamp in two parts, seconds and
nanoseconds, is similar to the timespec type and tuple of integers,
and so have the same drawbacks.

Adding new fields to the os.stat() structure does not solve the
nanosecond issue in other modules (e.g. the time module).


Add a boolean argument
----------------------

Because we only need one new type (Decimal), a simple boolean flag can
be added. Example: time.time(decimal=True) or time.time(hires=True).

Such flag would require to do an hidden import which is considered as
a bad practice.

The boolean argument API was rejected because it is not "pythonic".
Changing the return type with a parameter value is preferred over a
boolean parameter (a flag).


Add new functions
-----------------

Add new functions for each type, examples:

 * time.clock_decimal()
 * time.time_decimal()
 * os.stat_decimal()
 * os.stat_timespec()
 * etc.

Adding a new function for each function creating timestamps duplicate
a lot of code and would be a pain to maintain.


Add a new hires module
----------------------

Add a new module called "hires" with the same API than the time
module, except that it would return timestamp with high resolution,
e.g. decimal.Decimal.  Adding a new module avoids to link low-level
modules like time or os to the decimal module.

This idea was rejected because it requires to duplicate most of the
code of the time module, would be a pain to maintain, and timestamps
are used modules other than the time module. Examples:
signal.sigtimedwait(), select.select(), resource.getrusage(),
os.stat(), etc. Duplicate the code of each module is not acceptable.


Links
=====

Python:

 * `Issue #7652: Merge C version of decimal into py3k
<http://bugs.python.org/issue7652>`_ (cdecimal)
 * `Issue #11457: os.stat(): add new fields to get timestamps as
Decimal objects with nanosecond resolution
<http://bugs.python.org/issue11457>`_
 * `Issue #13882: PEP 410: Use decimal.Decimal type for timestamps
<http://bugs.python.org/issue13882>`_
 * `[Python-Dev] Store timestamps as decimal.Decimal objects
<http://mail.python.org/pipermail/python-dev/2012-January/116025.html>`_

Other languages:

 * Ruby (1.9.3), the `Time class <http://ruby-doc.org/core-1.9.3/Time.html>`_
   supports picosecond (10\ :sup:`-12`)
 * .NET framework, `DateTime type
<http://msdn.microsoft.com/en-us/library/system.datetime.ticks.aspx>`_:
   number of 100-nanosecond intervals that have elapsed since 12:00:00
   midnight, January 1, 0001. DateTime.Ticks uses a signed 64-bit integer.
 * Java (1.5), `System.nanoTime()
<http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/System.html#nanoTime()>`_:
   wallclock with an unspecified starting point as a number of nanoseconds, use
   a signed 64 bits integer (long).
 * Perl, `Time::Hiref module <http://perldoc.perl.org/Time/HiRes.html>`_:
   use float so has the same loss of precision issue with nanosecond resolution
   than Python float timestamps


Copyright
=========

This document has been placed in the public domain.