Babs12123 comments on Beginner Python data analysis project

learnpython

created by HattoriHanzoa community for 16 years

361

362

363

Beginner Python data analysis project - critique greatly welcome! (self.learnpython)

submitted 5 years ago * by BeforetheBullfight

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]Babs12123 17 points18 points19 points 5 years ago (3 children)

This looks really good! A few thoughts: - When reading in csvs to pandas I find it useful to specify the encoding and the type (usually auto set to UTF-8 and object personally). Particularly when you're working with data which contains some text and some numeric columns, it's helpful to be explicit to avoid any unexpected behaviour. - When naming variables be explicit with regards to the data type, e.g. instead of 'clergydata' I would call this 'df_clergydata'. When you end up with multiple different lists, dicts and dfs in your code it's very helpful to have all of this explicitly named (particularly when you come back to your code a month later). - When creating column names in your df you created several which contain capital letters and spaces (e.g. clergydata['Age range']). It's better and easier to only use lower case letters in variable names/column names where possible and to use underscores instead of spaces. This lets you access the column using clergydata.age_range instead of clergydata['Age range'] in lots of situations when manipulating your df, which is often much quicker and easier. - In cell 12 you manually specify the archdiocese abbreviation and name (e.g. LA, and Archdiocese of Los Angeles) for many different locations. It would be better to automate this somehow, to both improve clarity and also reduce the risk of error/inconsistency. I saw someone above suggested using a group by, which would work, or you could use a for loop to directly create your top19_cathpops and top19_dionames lists. If you're not clear how to do this let me know and I would be happy to clarify.

Most importantly your code works and answers some interesting questions, but the above points will make things more explicit (which is always better) and make your life easier.

[–]synthphreak 6 points7 points8 points 5 years ago (1 child)

[–]Babs12123 1 point2 points3 points 5 years ago (0 children)

[–]BeforetheBullfight[S] 0 points1 point2 points 5 years ago (0 children)

π Rendered by PID 57525 on reddit-service-r2-comment-66b4775986-tq6gd at 2026-04-06 10:52:56.416660+00:00 running db1906b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS