[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] bpo-34595: How to format a type name?

On 09/11/18 15:23, Victor Stinner wrote:
> Hi,
> Last week, I opened an issue to propose to add a new %T formatter to
> PyUnicode_FromFormatV() and so indirectly to PyUnicode_FromFormat()
> and PyErr_Format():
>     https://bugs.python.org/issue34595
> I merged my change, but then Serhiy Storchaka asked if we can add
> something to get the "fully qualified name" (FQN) of a type, ex
> "datetime.timedelta" (FQN) vs "timedelta" (what I call "short" name).
> I proposed a second pull request to add %t (short) in addition to %T
> (FQN).
> But then Petr Viktorin asked me to open a thread on python-dev to get
> a wider discussion. So here I am.
> The rationale for this change is to fix multiple issues:
> * C extensions use Py_TYPE(obj)->tp_name which returns a fully
> qualified name for C types, but the name (without the module) for
> Python name. Python modules use type(obj).__name__ which always return
> the short name.

That might be a genuine problem, but I wonder if "%T" is fixing the 
symptom rather than the cause here.
Or is this only an issue for PyUnicode_FromFormat()?

> * currently, many C extensions truncate the type name: use "%.80s"
> instead of "%s" to format a type name

That's an orthogonal issue -- you can change "%.80s" to "%s", and 
presumably you could use "%.80t" as well.

> * "%s" with Py_TYPE(obj)->tp_name is used more than 200 times in the C
> code, and I dislike this complex pattern. IMHO "%t" with obj would be
> simpler to read, write and maintain.

I consider `Py_TYPE(obj)->tp_name` much more understandable than "%t".
It's longer to spell out, but it's quite self-documenting.

> * I want C extensions and Python modules to have the same behavior:
> respect the PEP 399. Petr considers that error messages are not part
> of the PEP 399, but the issue is wider than only error messages.

The other major use is for __repr__, which AFAIK we also don't guarantee 
to be stable, so I don't think PEP 399 applies to it.
Having the same behavior between C and Python versions of a module is 
nice, but PEP 399 doesn't prescribe it. There are other differences as 
well -- for example, `_datetime.datetime` is immutable, and that's OK.

If error messages and __repr__s should be consistent between Python and 
the C accelerator, are you planning to write tests for all the affected 
modules when switching them to %T/%t?

> The main issue is that at the C level, Py_TYPE(obj)->tp_name is
> "usually" the fully qualified name for types defined in C, but it's
> only the "short" name for types defined in Python.
> For example, if you get the C accelerator "_datetime",
> PyTYPE(obj)->tp_name of a datetime.timedelta object gives you
> "datetime.timedelta", but if you don't have the accelerator, tp_name
> is just "timedelta".
> Another example, this script displays "mytimedelta(0)" if you have the
> C accelerator, but "__main__.mytimedelta(0)" if you use the Python
> implementation:
> ---
> import sys
> #sys.modules['_datetime'] = None
> import datetime
> class mytimedelta(datetime.timedelta):
>      pass
> print(repr(mytimedelta()))
> ---
> So I would like to fix this kind of issue.
> Type names are mainly used for two purposes:
> * format an error message
> * obj.__repr__()
> It's unclear to me if we should use the "short" or the "fully
> qualified" name. It should maybe be decided on a case by case basis.
> There is also a 3rd usage: to implement __reduce__, here backward
> compatibility matters.
> Note: The discussion evolved since my first implementation of %T which
> just used the not well defined Py_TYPE(obj)->tp_name.
> --
> Petr asked me why not exposing functions to get these names. For
> example, with my second PR (not merged), there are 3 (private)
> functions:
> /* type.__name__ */
> const char* _PyType_Name(PyTypeObject *type);
> /* type.__qualname__ */
> PyObject* _PyType_QualName(PyTypeObject *type);
> * type.__module__ "." type.__qualname__ (but type.__qualname__ for
> builtin types) */
> PyObject * _PyType_FullName(PyTypeObject *type);
> My concern here is that each caller has to handler error:
>    PyErr_Format(PyExc_TypeError, "must be str, not %.100s",
> Py_TYPE(obj)->tp_name);
> would become:
>    PyObject *type_name = _PyType_FullName(Py_TYPE(obj));
>    if (name == NULL) { /* do something with this error ... */
>    PyErr_Format(PyExc_TypeError, "must be str, not %U", type_name);
>    Py_DECREF(name);
> When I report an error, I dislike having to handle *new* errors... I
> prefer that the error handling is done inside PyErr_Format() for me,
> to reduce the risk of additional bugs.
> --
> Serhiy also asked if we could expose the same feature at the *Python*
> level: provide something to get the fully qualified name of a type.
> It's not just f"{type(obj).__module}.{type(obj).__name__}", but you
> have to skip the module for builtin types like "str" (not return
> "builtins.str").
> Maybe we can have "name: {0:t}, FQN: {0:T}".format(type(obj)). "t" for
> name and "T" for fully qualfied name. We would only have to modify
> type.__format__().
> I'm not sure if we need to add new formatters to str % args.
> Example of Python code:
>     raise TypeError("must be str, not %s" % type(fmt).__name__)
> I'm not sure about Python changes. My first concern was just to avoid
> Py_TYPE(obj)->tp_name at the C level. But again, we should keep C and
> Python consistent. If the behavior of C extensions change, Python
> modules should be adapted as well, to get the same behavior.
> Note: I reverted my change which added the %T formatter from
> PyUnicode_FromFormatV() to clarify the status of this issue.
> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/encukou%40gmail.com