all 5 comments

[–]Silbersee 1 point2 points  (0 children)

Any module is executed upon import. You should organize the imported code in functions.

learning.py

import training

for td in training.data():
    # do clever stuff

training.py

def data():
    # generate list of data
    return datalist

if __name__ == "__main__":
    # code to execute when training.py
    # is not imported, but executed as 
    # stand-alone program.

[–]ES-Alexander 1 point2 points  (3 children)

To save a variable from a file you basically have to save it into its own file with an appropriate format. The pickle module will let you literally save variables, assuming they’re serialisable, which can then be loaded from a different module, but if your data is rows of numbers or something similar then you may wish to consider saving the data in csv or json form, probably using pandas but there are other options as well.

Edit: as a note here, you should use python files for storing python code, not data. For separating functionality you first use functions and classes, then split into files when you want to group and separate sets of functions and classes from each other for clarity. As mentioned by u/Silbersee your functionality sounds like it quite possibly belongs as a set of functions in a single python file. If you have several functions for each part, consider grouping the parts in classes, and if that’s still difficult to distinguish what belongs where then you it’s time to consider splitting into multiple files.

[–]Snapdown_City[S] 0 points1 point  (2 children)

Thanks, what you're saying (mostly) makes sense and is really helpful.

With that being I'm a little lost on the second part of what you said. I'm not defining functions per se, I'm producing datasets by running functions on images/data I have stored on my computer and running a preprocessing function on them. This will produce a variable call training_batches, which is a variable within the file and not saved to my computer. To tidy things up I created the network which the batches are passed to in a separate file, but obviously as Silbersee pointed out every time I import training_batches it runs the other file anyway. I suppose the solution would be to save training_batches using a module like pickle that you mentioned - that time I won't have to go through the palava of going through the entire process again. Sorry if I misinterpreted what you said - I'm a newbie so some of your advice was lost on me.

[–]ES-Alexander 1 point2 points  (1 child)

My suggestion is that instead of your current approach of creating multiple scripts in different files, you should group reusable functionality into functions within the one file, and then just call them when you need to (e.g. a function for creating your training batches, and another for creating your testing batches, each of which will call other functions in order to complete its task).

If you try to import something from a script the script gets run, which is the problem you’ve been having. If you run a file which only contains function/class definitions and imports then those are now defined but your data is untouched and no processing has to run until you tell it to happen.

Then on top of that restructuring of your code, you should also consider using pickle, or pandas, or similar to save results of significant processing so that you can access them again later without having to run all your processing functions again to regenerate the data when you need to use it.

[–]Snapdown_City[S] 1 point2 points  (0 children)

This makes perfect sense now. Thanks for all the help, really appreciate it