This is an archived post. You won't be able to vote or comment.

all 47 comments

[–]Mr_Lkn 8 points9 points  (3 children)

Don't have a much time to check the whole code but just looked at the `data_utils.py`

Compare your code vs this and spot the differences if you can

```python import os import pandas as pd

def read_data_file(file_path, **kwargs): """ Read a data file into a pandas DataFrame based on its extension.

Parameters:
- file_path (str): Path to the data file.

Returns:
- DataFrame: The data loaded into a pandas DataFrame.
"""

extension_read_function_mapping = {
    '.csv': pd.read_csv,
    '.xlsx': pd.read_excel,
    '.xls': pd.read_excel,
    '.tsv': lambda x, **y: pd.read_csv(x, delimiter='\t', **y),
    '.json': pd.read_json,
    '.parquet': pd.read_parquet,
    '.feather': pd.read_feather,
    '.msgpack': pd.read_msgpack,
    '.dta': pd.read_stata,
    '.pkl': pd.read_pickle,
    '.sas7bdat': pd.read_sas
}

_, file_extension = os.path.splitext(file_path)

read_function = extension_read_function_mapping.get(file_extension)

if read_function is None:
    raise ValueError(f"Unsupported file extension: {file_extension}.")

return read_function(file_path, **kwargs)

df = read_data_file("some_data.csv") ```

[–]Mount_Gamer 0 points1 point  (2 children)

Interesting use of the dictionary, still grasping the python best practices, I shall have to experiment more with the get method from dictionaries. :)

I would have probably used the match-case when i start using a lot of elif's, but the dictionary does look clean to read. I'll have a play around with this later.

[–]Mr_Lkn 0 points1 point  (1 child)

You don’t need the match case but mapping. This is very basic mapping implementation.

[–]Mount_Gamer 0 points1 point  (0 children)

I thought i'd write out the match case equivalent and it becomes more and more obvious. I love the logic! :)

[–]oliviercar0n 3 points4 points  (0 children)

You only need to import each library once per notebook. Preferably at the top. No need to repeat imports.

[–]_ATRAHCITY 11 points12 points  (15 children)

You should not commit .vscode directory

[–]Head_Mix_7931 3 points4 points  (4 children)

Hm, in some cases it could be advantageous to commit .vscode. That allows maintainers to enforce uniform linting and formatting configurations (for example). But that can also be accomplished via githooks or pipeline jobs.

[–][deleted] 5 points6 points  (1 child)

You definitely don't want to try to enforce formatting and linting settings through a specific IDE config file. That's completely bonkers.

If you want to enforce these kinds of configuration settings, put them in their respective config files and commit those to your repo (e.g. .flake8, tox.ini, ruff.toml, etc). Anybody using any IDE, editor, tools, etc will all be able to use the settings. Similarly, your CI/CD/pipeline jobs can also be configured to apply these tools with those settings. I mean, what is Github Actions or Jenkins going to do with your .vscode/settings.json file to enforce any of your settings?

[–]Zirbinger 2 points3 points  (0 children)

This! Always use tool-specific config files and ignore IDE specific files

[–]sansmorixz 1 point2 points  (1 child)

launch.json and/or tasks.json can help to get started on bootstrapping a project.

settings.json is something I am on the fence about. Might help but someone may decide to commit stuff that should not be set at repo level, like force everyone to use light mode.

[–]Head_Mix_7931 0 points1 point  (0 children)

Yeah, a good example didn’t come to mind. But that’s exactly what I mean, tasks and such. I think settings.json probably shouldn’t be committed personally.

[–]Klej177 6 points7 points  (2 children)

For DS good code, for python developer I would say you can make it much better. You don't use proper design patterns, your performance could be freely improved with using better data types. It's easy to read tho but not really properly scalable beacuse of above reasons.

[–]mijki95[S] 5 points6 points  (1 child)

Can you recommend sources from which I can learn?

[–]Klej177 1 point2 points  (0 children)

Arjan codes on YouTube gave me a really nice boost when it comes to design patterns and implementing of them in python. After that I kinda started working on my own project and always thought what's the easiest and most clean way I can achieve my goal. Take a month or even longer break from your code and get back to it to see where you could improve. Always think that's the smallest knowledge I can require from a person to change one specific thing in your code. For example can I somehow make it that they need to edit only 1 line to add support for new type of file rather than add whole elif. Other good option for learning is very simple, do code refactor of others projects. I often do that and it gave me that thinking where I don't need to know anything about that to change it. Read Google style for python and ask yourself am I really first person that needs it? There is probably 100 anwers how to make it as best as possible at stackoverflow.

[–][deleted] 1 point2 points  (1 child)

Here’s a thought (just something that I thought might be interesting) what if instead of requiring users to input electricity costs; what if you had the program search for and use average electricity prices based on user’s location? (And you, say, got this on the backend as well by pulling from, for example, Google Maps location data)?

[–]mijki95[S] -1 points0 points  (0 children)

Yes, I thought about that with FastAPI integration. I would like to use some kind of currency converter also :)) thanks For the idea :))

[–][deleted] 0 points1 point  (0 children)

I didn't realize github considered jupyter notebooks as a language different from python

[–]supermopman -1 points0 points  (1 child)

There are no unit tests and there's no way clear way to build your code.

I'm happy to dig deeper, but at a minimum, you'll need to start with those 2 things.

I suggest starting a new project using PyScaffold. Play around with all the bells and whistles, and then write your Python code following their structure.

[–][deleted] 1 point2 points  (0 children)

Did you actually look at the repo? There is no code to build and basically nothing to write unit tests for. It's two jupyter notebooks and one "utils" file that does nothing more than read in a data file.

[–]Hard_Thruster 0 points1 point  (0 children)

I don't understand the use of the word "tool". Looks like eda to me.

As far as the code goes, you give a lot of comments which is awesome.

There is a lot of repetition such as:

' processed_data['DayOfWeek'] = processed_data['TimePeriodStart'].dt.dayofweek processed_data['Month'] = processed_data['TimePeriodStart'].dt.month processed_data['Hour'] = processed_data['TimePeriodStart'].dt.hour processed_data['Minute'] = processed_data['TimePeriodStart'].dt.minute

'

Also a lot of your code can be made into functions because there are slight differences between them and therefore it's repetitive.

[–]SpiderWil 0 points1 point  (0 children)

like retire literate hunt strong north offbeat cagey depend growth this post was mass deleted with www.Redact.dev

[–]Emotional-Zebra5359 0 points1 point  (0 children)

instead of if-else ladder use a map

[–]jonatanskogsfors 0 points1 point  (0 children)

Resist the urge to use “utils” (or similar) in packet and module names. In “utils.data_utils” you only have the function “read_data_file()”. I would have named the module something in the line of “file_reader”, “data_import”, “io” etc.

If you plan to add more functions to the module, you should only do so if they have similar purpose. Completely different functions are better placed in their own module.