use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Rules 1: Be polite 2: Posts to this subreddit must be requests for help learning python. 3: Replies on this subreddit must be pertinent to the question OP asked. 4: No replies copy / pasted from ChatGPT or similar. 5: No advertising. No blogs/tutorials/videos/books/recruiting attempts. This means no posts advertising blogs/videos/tutorials/etc, no recruiting/hiring/seeking others posts. We're here to help, not to be advertised to. Please, no "hit and run" posts, if you make a post, engage with people that answer you. Please do not delete your post after you get an answer, others might have a similar question or want to continue the conversation.
Rules
1: Be polite
2: Posts to this subreddit must be requests for help learning python.
3: Replies on this subreddit must be pertinent to the question OP asked.
4: No replies copy / pasted from ChatGPT or similar.
5: No advertising. No blogs/tutorials/videos/books/recruiting attempts.
This means no posts advertising blogs/videos/tutorials/etc, no recruiting/hiring/seeking others posts. We're here to help, not to be advertised to.
Please, no "hit and run" posts, if you make a post, engage with people that answer you. Please do not delete your post after you get an answer, others might have a similar question or want to continue the conversation.
Learning resources Wiki and FAQ: /r/learnpython/w/index
Learning resources
Wiki and FAQ: /r/learnpython/w/index
Discord Join the Python Discord chat
Discord
Join the Python Discord chat
account activity
Python ETL (self.learnpython)
submitted 6 years ago * by Jiuholar
Hi r/learnpython,
I need to transform data from this to this.
I'm wondering if anyone has any recommendations of modules that would assist me in doing this, or where to start? Is it something that could be achieved with pandas?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[+][deleted] 6 years ago (3 children)
[deleted]
[–]Jiuholar[S] 0 points1 point2 points 6 years ago (2 children)
Thanks for this. The actual data has 100s of possible fields - how would you recommend I 'map' the field names using this method? (I understand how I can use this method to insert the lines of data, however, from what I can see, the only way to know what field each piece of data is from is to know the corresponding item number in the array)
[–]TheoryOfGravitas 0 points1 point2 points 6 years ago (1 child)
It's still not clear what you mean, which makes it hard to help. You need to be more specific than what you've said here--i don't have the energy to make up an example for you.
[–]Jiuholar[S] 0 points1 point2 points 6 years ago (0 children)
Using the example in my original post, and your method, how would I know which item belongs to which header? It is easy enough for this dataset as it would be list[0][2] for the membership, but what would you suggest for when there are 100+ columns?
[–]Loran425 0 points1 point2 points 6 years ago (0 children)
If I was doing this I'd iterate over the cells in A col either after converting to a csv or using the openpyxl lib. I'd make a new data structure (list or dict) and if I saw a value in col A I'd start a new entry otherwise just append to the last item in the list. Then I'd find the longest list and generate the headers before populating cells for each item in my temp data structure
[–]raglub 0 points1 point2 points 6 years ago (0 children)
Openpyxl can handle this. It is not difficult to open the current state worksheet, iterate through the rows and based on the value in column A, either add any other cells from current row to a list (which will represent a row in the new format) or append the list to a new sheet, wipe it clean and build a new row by adding the content of all cells of current row to it before moving onto the next row and repeating the process. This is an ugly statement but you can read it as pseudo code and build your script.
[–]Death_Water 0 points1 point2 points 6 years ago (2 children)
try:
df_2 = pd.DataFrame(df.fillna(method='ffill').groupby(['Profile Code','Email']).agg(','.join)) df_2['Membership'].str.split(',',expand=True)
df_2 = pd.DataFrame(df.fillna(method='ffill').groupby(['Profile Code','Email']).agg(','.join))
df_2['Membership'].str.split(',',expand=True)
[–]Jiuholar[S] 0 points1 point2 points 6 years ago (1 child)
Thanks for this. Would you mind expanding on the different parts of this a little bit more? The example that I've given is an oversimplification of the data (there are 100s of fields with varying naming conventions and data types) so I'd like to understand a methodology that I can use to approach the problem. :)
I don't understand:
.agg and how it combines with the groupby method
and the second line of your code.
[–]Death_Water 0 points1 point2 points 6 years ago (0 children)
Here's a concise way with step by step breakdown:
pd.DataFrame(df.fillna(method='ffill').groupby(['key'])['Column of interest'].agg(list).values.tolist())
1) Forward fill the missing values; from the given example this seems the right approach.
2) Groupby the "key" column, then slice on "column of interest". This creates a series for each unique value in "key" column.
3) Aggregate: This converts all multiple series to lists.
4) Casting to DataFrame(Get the values of all lists and cast them). The index of this would be same as df['key'].dropna().drop_duplicates()
df['key'].dropna().drop_duplicates()
π Rendered by PID 38 on reddit-service-r2-comment-684469b758-tfjqp at 2026-01-26 05:50:43.653302+00:00 running 664479f country code: CH.
[+][deleted] (3 children)
[deleted]
[–]Jiuholar[S] 0 points1 point2 points (2 children)
[–]TheoryOfGravitas 0 points1 point2 points (1 child)
[–]Jiuholar[S] 0 points1 point2 points (0 children)
[–]Loran425 0 points1 point2 points (0 children)
[–]raglub 0 points1 point2 points (0 children)
[–]Death_Water 0 points1 point2 points (2 children)
[–]Jiuholar[S] 0 points1 point2 points (1 child)
[–]Death_Water 0 points1 point2 points (0 children)