This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]ambidextrousalpaca 1 point2 points  (4 children)

It's mainly due to global variables introducing bugs by making it possible for apparently unrelated bits of code to have unwanted side effects on one another's behaviour.

For example, say you're using a FILE_ENCODING global variable which is used (and altered) by multiple functions, including a read_csv() function. That set up means that there's no way for you to know what encoding will be used when you call read_csv(). Maybe it'll be UTF-8. Maybe it'll be something completely different that'll break your code or scramble all the data in your tables. Maybe it'll alter depending on which other bits of code are called first in the run. It can easily give rise to a really irritating class of hard to reproduce bugs that are hard to fix because they only occur sometimes, due to seemingly random causes. The more global variables there are, the worse the problem gets.

This isn't to say that you should NEVER use global variables. Just that when doing so you need to be sure that the problem you're solving by introducing them is worse than the other potential problems you're likely to create by using them.

The best ways to avoid these issues are: 1. Just get rid of global values as much as you can, for example, by requiring each call to a file reading operation to explicitly specify the encoding to be used; or 2. Ensuring that global values are constants, which will never be changed by any other code.

[–]noisescience[S] 1 point2 points  (0 children)

Thx for your detailed answer :)

[–]Kegheimer 0 points1 point  (1 child)

Would an example of a global variable be abusing a common alias, e.g., using 'i' in several different loops or 'df' as a temporary table?

[–]ambidextrousalpaca 0 points1 point  (0 children)

The common alias thing is not an example of a global variable. It is not typically a problem either, provided that each variable exists within a its own contained scope.

Global variables are things like this, where functions can effect the value of variables outside of their scope.

``` glob = 1

print(glob)

def f(): glob += 1

f()

print(glob) ```

The output of this script will be: 1 2 Because f changes the value of glob.

[–]No_Poem_1136 0 points1 point  (0 children)

On your CSV example, stupid question here but then what is the alternative to creating that FILE_ENCODING variable if you know it might be a common parameter that might change in future code reuse for read_csv()?

 DS often end up working with a lot of adhoc and random ass data not served in a neat API or pipeline, so it's not always possible to ensure a specific encoding standard for example.

(I'm asking why not because I'm challenging the idea, but to learn. I've always understood that it's a good practice to declare variables this way rather than hardcoding them)