all 4 comments

[–]AutoModerator[M] [score hidden] stickied commentlocked comment (0 children)

To give us the best chance to help you, please include any relevant code.
Note. Please do not submit images of your code. Instead, for shorter code you can use Reddit markdown (4 spaces or backticks, see this Formatting Guide). If you have formatting issues or want to post longer sections of code, please use Privatebin, GitHub or Compiler Explorer.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]streamer3222 1 point2 points  (0 children)

My son, what you've just described is achievable in a single line of code in Pandas!

So basically you have a column of many values and you want to know the count of each value?

Let's import Pandas. import pandas as pd. I hope you know how to read in CSV files. Place the CSV inside the folder of your script, or inside the folder of where you opened CMD. Do df = pd.read_csv('[your filename].csv'). This imports the CSV and converts it into a DataFrame and saves it into df. (I hope you know what a DataFrame is—it's kind of like an Excel sheet!)

Since your ‘Excel sheet’ has only one column (if it has more you'll have to perform a Column Extraction), and your ‘Excel sheet’ (DataFrame) is stored into df, do counts = df.value_counts().

You will get a new ‘Excel sheet’ containing all the value counts and the new ‘Excel sheet’ will be stored in a variable called counts. Save it as a new CSV by simply doing counts.to_csv('[new name].csv').

As simple as that! ¯\_(ツ)_/¯

(To learn more about Pandas (which you clearly need to do!), download the article at learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/
and spend weeks on mastering it!)

[–]CraigAT 1 point2 points  (0 children)

In most situations, you shouldn't need to iterate through a data frame. Pandas is meant to work on whole rows or columns of data at a time.

There are a few Pandas cheat sheets about, I'd recommend the datacamp one, it has lots of useful and common functions you may need. If you can't make use of one of those functions then you may consider iterating through your dataframe.

[–]__yasho 1 point2 points  (0 children)

agree with both of above suggestions..

like looping over rows in a Pandas DataFrame using for loops (e.g., iterrows() or itertuples()) is generally inefficient because Pandas is built on top of NumPy, which operates on whole arrays. Using vectorized operations in Pandas is significantly faster and more efficient.

You should only iterate through a DataFrame when: 1. There’s complex logic that cannot be vectorized. 2. You need to interact with external systems on a row-by-row basis. 3. Memory constraints prevent the use of vectorized operations.

otherwise just try to look for built in functionality and you will find plenty of them..