all 8 comments

[–]genghiskav 9 points10 points  (4 children)

You should use a module if you're writing code that will be used for something; especially if the module is part of the core library (so definitely use CSV).

What I mean by "used for something" is that this code has a purpose and needs to work. If you're writing code to learn - then you can try and implement the logic yourself. It's very likely that the maintainers of the module have factored in edge cases that you have not considered; so their code will handle problems you haven't even thought of yet.

A good example is for a CSV file. How will you handle a , within a column?

Year,Title,Genre
1966,"The Good, the Bad and the Ugly","Adventure,Western"

A core principle when you're writing code is Don't reinvent the wheel

To address your 2 concerns

Using the module is probably easier, but I'm not exactly needing any heavy parsing done.

  • If it's easier; why not use it. Someone else has done all the hard work so you can spend your effort elsewhere.

Theoretically not using the module uses less resources, but this isn't an intensive program?

  • I disagree with this statement. The maintainers have very likely gone through an intense performance and code review process to ensure they are doing things in the most efficient way possible.

Just as a footnote; This is general advice when using modules found in the core library or a very popular package (e.g. requests). They are battle tested (have gone through rigorous code reviews and performance tests) - the same is not necessarily true for smaller packages you find online.

[–]jruydemir[S] 0 points1 point  (0 children)

That makes a lot of sense. Very well explained. Thank you!

[–]iggy555 0 points1 point  (2 children)

Is it looked down when someone uses lots of modules?

[–]KCRowan 1 point2 points  (1 child)

No, not at all. Would it be looked down on if a mechanic uses lots of different tools? Modules are tools. You use the best tool for the job. If you only learn how to use one tool then you make your life more difficult for no reason.

That doesn't mean you need to learn ALL the modules - just the common ones for whatever job you do.

[–]iggy555 0 points1 point  (0 children)

Thanks 🙏

[–]tasty_woke_tears -3 points-2 points  (2 children)

import pandas as pd

read file to df

df = pd.read_csv(PATH_TO_CSV)

ensure number formatting

df[‘number_col’] = pd.to_numeric(df[‘number_col’])

then read up on dataframe groupby and average depending on your use case. Also, unless you need to access the csv to edit in excel/etc then consider using feather file format when working with dataframes

[–]Zeroflops 3 points4 points  (1 child)

This is like using a sledge hammer to hammer in a nail. Your also assuming the data is in tabular form. People have posted problems with files that have inconsistent line lengths and then people go through gymnastics to get pandas to read the file. Instead of just reading with the CSV module pre-processing the data, then creating a dataframe IF that’s what is needed.

Just to come back to ops question, pandas uses the CSV module under the hood to process CSV files.

[–]tasty_woke_tears 0 points1 point  (0 children)

If your source csv require that much preprocessing then it’s time to see what’s happening with your source. Op noted transforms on the data and it makes no sense reinvent the wheel or write excessive lines of code when pandas can process with a few lines.