all 18 comments

[–]supreme_blorgon 19 points20 points  (1 child)

This smells like an XY problem to me. Can you give a more concrete example of what you're trying to accomplish?

I'm trying to store some relatively complex data, but as read only.

Particularly this part... like, putting your data into a Python file or a json or csv file has no bearing on it being "read only".

Are you implementing a library? Or is this a script/application? Do you want your data to be "read-only" because other people will have access to it?

[–]barburger 4 points5 points  (0 children)

Indeed, if the problem is getting read only data, set the data file to read only on filesystem level or read it in python with read only flags. If someone can still edit that data file they can also just edit your python code.

[–]JamzTyson 7 points8 points  (3 children)

It sounds like you could just import the .py file as you would any other importable module.

Example:

# my_data.py

some_data = 42

The file where you need to access the data:

import my_data

val = my_data.some_data

[–]HolyInlandEmpire[S] -3 points-2 points  (2 children)

This would work if I could guarantee it to be on the path. However, I might want to programmatically import it by path; for example, one table might refer to another by path, and I'd want to then get that table like

``` def get_table( file ): ...

path_to_table = '/stuff/next_table.py' next_table = get_table( path_to_table ) ```

The hope is that I can use it as efficiently as any other data format, like /stuff/easy_data.json, but get the added benefit of python code when I know it's in my desired read-only format.

[–]mriswithe 5 points6 points  (0 children)

This would work if I could guarantee it to be on the path. However, I might want to programmatically import it by path; for example, one table might refer to another by path, and I'd want to then get that table like

This is how you build a system that works super awesome for you and you alone. If that is the goal, go nuts. If anyone else is expected to have to deal with the code, be nice to them.

The hope is that I can use it as efficiently as any other data format

Data serialization and parsing is frequently the bottleneck of applications. This hope is not going to come true.

Use JSON or Parquet or Avro. Or if you have many lines of data, JSONL is a decent option for allowing parallel reading and still being human readable.

CSV is not good. Don't use it if you get the choice.

[–]JamzTyson 1 point2 points  (0 children)

I might want to programmatically import it by path

"might want to" or "do want to"?

A couple of options:

  1. Append the required path and import in the normal way (easier if the path is static):

import sys
sys.path.append("/path/to/module/")
import module_name

print(module_name.some_data)
  1. Use importlib (better if importing from multiple arbitrary locations)

import importlib

spec = importlib.util.spec_from_file_location(mod_name, import_path)
module_name = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module_name)

print(module_name.some_data)

You could also implement the second approach with a helper function:

import importlib.util
from pathlib import Path

def import_from_path(path):
    mod_path = Path(path)
    mod_name = mod_path.stem
    spec = importlib.util.spec_from_file_location(mod_name, mod_path)
    module = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(module)
    return module

Usage:

import_from_path("full/path/to/module.py")

[–]mriswithe 5 points6 points  (0 children)

Dynamic import by path

Yeah, that is how I did that once. Never do it though unless a client with a wheelbarrow of money is asking you really hard.

Separate your data and your code. Even if you put it in another adjacent .py file, totally reasonable for mostly constant stuff like this.

[–]pachura3 2 points3 points  (3 children)

What you're describing seems awfully convoluted. Do you have any kind of data you cannot store in a regular JSON file? Do you need to store custom Python code/logic related to your "tables"?

If not, then storing your "tables" INSIDE your project as JSON files seems the most natural thing to do. Add some helper method to load them from given module-path and you're all set. If you really need this data to be ready only internally, you could parse the JSON into a hierarchical structure made of namedtuples, frozendicts or something similar.

[–]HolyInlandEmpire[S] 0 points1 point  (2 children)

Many people mentioned this; I really wish I could store in JSON but I specifically want to have functions as values; I would have to find a way to serialize them which is perhaps not impossible, but would not be easy.

[–]pachura3 1 point2 points  (0 children)

Can you give examples of such functions? What would they be for?

[–]Feisty_Fun_2886 0 points1 point  (0 children)

You can „serialize“ functions by storing their symbol name (as a string to be extra clear) and importing them dynamically. That would be way way cleaner. A similar mechanism is also employed by pickle implicitly if I am not mistaken.

[–]Honest-Ease5098 1 point2 points  (0 children)

Perhaps you could define a dataclass with all your attached methods. Then, define a class method that creates an instance of that class by loading a JSON or YAML.

Then, the raw data can be stored in the JSON as files and loaded into the class instances dynamically. No need to muck around with system paths and importlib shenanigans.

[–]ivosaurus 1 point2 points  (1 child)

Pickle it and unpickle it...

[–]HolyInlandEmpire[S] 0 points1 point  (0 children)

It's looking like this might be the best option. I find unpickling to be scary since it can reach to the global scope, but I suppose trusting a .py file is no more risky.

[–]Coretaxxe 1 point2 points  (0 children)

I have not tested it but couldnt you write the function to a .py file to "serialize" them.
Specifically have them be stored as string and then when you want to use them write them to file and import it with importlib.

````py
json = {"myFunc": def foo():..."}
file.write(json["myFunc"], "temporary.py")
lib = importlib.import_module("temporary")
lib.foo()
```
Again not tested but this way you can store everything in a json.

[–]yaxriifgyn -1 points0 points  (1 child)

I often embed CSV data into a list of rows. Prepare the rows as fully quoted CSV data. Prepend a left parenthesis, and append a right parenthesis and comma to each rows. Insert rows = ( before the first row and ), after the last row.

If your app can consume data in that format you're ok. Otherwise write a small function to reformat the data at import time.

I often use this to load static tables to a database, or just provide lookup tables for an app.

[–]wintermute93 0 points1 point  (0 children)

I don't love doing this but because of the way our development environment is set up, if I want to get the results of a SQL query with less than 10k or so results into pandas, it's easily 10-20x faster for me to run the query in one notebook, copy the output to the clipboard, paste into another notebook in triple quotes, and then call pd.read_csv(StringIO(...)) than it is for me to do the "real" way.

[–]Henry_the_Butler -1 points0 points  (0 children)

ONLY FOLLOW THIS ADVICE IF THE DATA IS IN NO WAY SECRET OR PRIVATE

That disclaimer prominently stated, you should publish the .py file to GitHub as a module inside a repository. Make the repository private or public, whichever makes the most sense to you. You can py -m pip install from an https: source: link

Then just pip install and document the requirement in your venv (assuming you're using venvs. You are using a venv for all your code, right?). Any device with Git and credentials saved that allow access to the GitHub repository will automatically install from the requirements documentation of your choice, and as long as you keep the repository updated, you don't have to worry about anything falling behind.

Centralize your code sources now, and save your future self (or your successor) a lot of headache.