Questions about the IO modules and C-api
On 6/2/19, Windson Yang <wiwindson at gmail.com> wrote:
> f = open('myfile, 'a+b')
This is missing the closing quote around 'myfile'.
> I added a printf statement at the beginning of _io_open_impl
Repeatedly rebuilding the interpreter sounds like a frustrating
learning experience, IMO. Use a debugger such as gdb in Linux or cdb,
WinDbg or Visual Studio in Windows.
> 2. I'm not familiar with the C,
IMO, start with a good book or tutorial on C.
> I found `self->raw` is an empty PyObject and
> `_PyIO_str_write` is a global variable which is NULL. Why an empty
> PyObject have a write method?
The macro that defines _PyIO_str_write was already pointed out to you
by Thomas. A C programmer should know to look for a macro definition,
and someone experienced with the Python 3 C API should know to look in
the module initialization function PyInit_<module name>. Consider that
in terms of learning Python internals, you may be putting the cart
before the horse as they say.
Let's inspect these values in the console debugger (cdb) in Windows,
which is not the greatest debugging experience, but it works. Below I
have the debugger attached to an interactive Python 3.8 session with a
breakpoint set on _bufferedwriter_raw_write:
>>> f = open(filename, 'wb', buffering=10)
>>> n = f.write(b'spam' * 3)
00007ffd`1950ff50 4c8bdc mov r11,rsp
Let's see what self->raw is.
0:000> ?? self->raw->ob_type->tp_name
char * 0x00007ffd`197dd378
It's the FileIO object that we can reference in Python as the `raw`
attribute of the buffered writer. Next check the value of
0:000> ?? ((PyASCIIObject *)_PyIO_str_write)->state
+0x000 interned : 0y01
+0x000 kind : 0y001
+0x000 compact : 0y1
+0x000 ascii : 0y1
+0x000 ready : 0y1
This is a compact ASCII string, meaning that it's a char string that
immediately follows the object. Use pointer arithmetic and cast to
0:000> ?? (char *)(((PyASCIIObject *)_PyIO_str_write) + 1)
char * 0x0000014c`823b4560
So we're calling the write method of our raw object, i.e. f.raw.write.
> Why we didn't just use `write()` system call directly?
The raw layer is a necessary abstraction in the I/O stack. For a
concrete example, consider the raw I/O type io._WindowsConsoleIO. For
writing, it transcodes from UTF-8 to UTF-16LE and calls the
wide-character function WriteConsoleW. For reading, it calls
ReadConsoleW and transcodes from UTF-16LE to UTF-8.