osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Tips or strategies to understanding how CPython works under the hood


On 9 January 2018 at 16:18, Chris Angelico <rosuav at gmail.com> wrote:
> On Wed, Jan 10, 2018 at 2:21 AM, Robert O'Shea
> <robertoshea2k11 at gmail.com> wrote:
>> Hey all,
>>
>> Been subscribed to this thread for a while but haven't contributed much.
>> One of my ultimate goals this year is to get under the hood of CPython and
>> get a decent understanding of mechanics Guido and the rest of you wonderful
>> people have designed and implemented.
>>
>> I've been programming in python for nearly 10 years now and while I can
>> write a mean Python script, I've been becoming more and more interested in
>> low level operations or complex C programs so I thought I could spread my
>> love of both to make a difference for me and others.
>>
>> So besides just grabbing a chunk of CPython source code and digesting it, I
>> was wondering if those of you have read and understood the source code, do
>> you have any tips or good starting points?
>
> Cool! Let's see.
>
> The first thing I'd do is to explore the CPython byte code. Use the
> 'dis' module to examine the compiled version of a function, and then
> look at the source code for dis.py (and the things it imports, like
> opcode.py) to get a handle on what's happening in that byte code.
> CPython is a stack-based interpreter, which means it loads values onto
> an (invisible) internal stack, processes values at the top of the
> stack, and removes them when it's done.
>
> Once you've gotten a handle on the bytecode, I'd next dive into one
> particular core data type. Pick one of dict, list, str (note that, for
> hysterical raisins, it's called "unicode" in the source), int (for
> similar hysterical raisins, it's called "long"), etc. In the CPython
> source code, there's an Objects/ directory, Explore the functionality
> of that one object type, keeping in mind the interpreter's stack. Get
> an understanding of the different things you can do with it at the low
> level; some of them will be the same as you're used to from the high
> level, but some won't (for instance, Python code is never aware of a
> call to list_resize). Especially, read all the comments; the top few
> pages of dictobject.c are pretty much entirely English, and give a lot
> of insight into how Python avoids potential pitfalls in dict
> behaviour.
>
> From there, it's all up to you! Don't hesitate to ask questions about
> stuff you see. Tinkering is strongly encouraged!
>
> Oh, one thing to keep an eye on is error handling. You might discover
> something because there's code to raise an exception if something
> happens... like raising ValueError("generator already executing"),
> which I found highly amusing. (I cannot imagine ANY sane code that
> would ever trigger that error!)
>
> Have fun with it!

In addition to Chris' suggestions, it would probably be good to look
at the documentation - the "Extending and Embedding" and "Python/C
API" manuals, although focused more on people writing C code to
interface with Python, nevertheless include a lot of information about
how the C code that implements Python works. And a lot of the core
data types (as well as anything in the standard library) are written
using the same C API as 3rd party extensions use, so the documentation
will give you a lot of help with those sections of the code.

Paul