[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How should we use global variables correctly?

On 22Aug2019 11:12, Michael Torrie <torriem at gmail.com> wrote:
>On 8/22/19 10:00 AM, Windson Yang wrote:
>> I can 'feel' that global variables are evil. I also read lots of articles
>> proves that (http://wiki.c2.com/?GlobalVariablesAreBad). However, I found
>> CPython Lib use quite a lot of `global` keyword. So how should we use
>> `global` keyword correctly? IIUC, it's fine that we use `global` keyword
>> inside the Lib since most of the time the user just import the lib and call
>> the API. In any other situation, we should avoid using it. Am I right?
>The "global" keyword only refers to the current module as far as I know.
> Thus global variables are global only to the current python file, so
>the damage, as it were, is limited in scope.


>And it's only required if
>you plan to write to a global variable from another scope; you can
>always read from a parent scope if the name hasn't been used by the
>local scope.  I'm sure there are use cases for using the global keyword.
> It's not evil.  It's just not necessary most of the time.  I don't
>think I've ever used the "global" keyword.  If I need to share state, or
>simple configuration information, between modules, I place those
>variables in their own module file and import them where I need them.

I've used it a few times. Maybe a handful of times in thousands of lines 
of code.

As Michael says, "you can always read from a parent scope if the name 
hasn't been used by the local scope". What this means is this:


    def factors_of(n):
        factors = _MODULE_LEVEL_CACHE.get(n)
        if factors is None:
            factors = factorise(n)
            _MODULE_LEVEL_CACHE[n] = factors
        return factors

    def factorise(n):
        ... expensive factorisation algorithm here ...

Here we access _MODULE_LEVEL_CACHE directly without bothering with the 
global keyword. Because the function "factors_of" does not _assign_ to 
the name _MODULE_LEVEL_CACHE, that name is not local to the function; 
the outer scopes will be searched in order to find the name.

Now, Python decides what variable are local to a function by staticly 
inspecting the code and seeing which have assignments. So:

    x = 9
    y = 10
    z = 11

    function foo(x):
        y = 5
        print(x, y, z)

Within the "foo" function:

- x is local (it is assigned to by the function parameter when you call 

- y is local (it is assigned to in the function body)

- z is not local (it is not assigned to); the namespace searching finds 
  it in the module scope

Note that in the "factors_of" function we also do not _assign_ to 
_MODULE_LEVEL_CACHE. We do assign to one of its elements, but that is an 
access _via_ _MODULE_LEVEL_CACHE, not an assignment to the name itself.  
So it is nonlocal and found in the module namespace.

However, where you might want the use "global" (or its modern friend 
"nonlocal") is to avoid accidents and to make the globalness obvious.  
The same example code:

    x = 9
    y = 10
    z = 11

    function foo(x):
        y = 5
        print(x, y, z)

When you use a global, that is usually a very deliberate decision on 
your part, because using globals is _usually_ undesirable. When all 
variables are local, side effects are contained within the function and 
some surprises (== bugs) are prevented.

Let's modify "foo":

    function foo(x):
        y = 5
        z = y * 2
        print(x, y, z)

Suddenly "z" is a local variable because it is assigned to.

In this function it is all very obvious because the function is very 
short. A longer function might not have this be so obvious.

So: was "z" still intended to be global?

If yes then you need the global keyword:

    function foo(x):
        global z
        y = 5
        z = y * 2
        print(x, y, z)

And even if we were not assigning to "z", we might still use the 
"global" statement to make it obvious to the reader that "z" is a 
global; after all, if it not very visually distinctive - it looks a lot 
like "x" and "y".

So my advice after all of this is:

As you thought, globals are to be avoided most of the time. They invite 
unwanted side effects and also make it harder to write "pure functions", 
functions with no side effects. Pure functions (most Python functions) 
are much easier to reuse elsewhere.

However, if you have a good case for using a global, always use the 
"global" statement. It has the following benefits: it makes the 
globalness obvious to the person reading the code and it avoids a global 
variable suddenly becoming local if you assign to it. (NB: the "time" of 
that semantic change is when you change the code, _not_ when the 
assignment itself happens.)

Cameron Simpson <cs at cskk.id.au>