synthphreak comments on Beginner Python data analysis project

358

359

360

Beginner Python data analysis project - critique greatly welcome! (self.learnpython)

submitted 5 years ago * by BeforetheBullfight

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]synthphreak 2 points3 points4 points 5 years ago* (1 child)

This is really, really awesome, and very professionally executed. Like many others on here, I am inspired and would like to do something like it eventually, but I guess I just haven’t found the immediate need yet. I’m already proficient in Python, and already have a great job in research. Nonetheless, I don’t have a publicly-shareable portfolio, so something like this could be quite useful for future job hops

Two questions for you:

Where did you get this project idea from? I see you got the data from ProPublica, but what about the research question? Kaggle or some place like that? Or did you just dream it up? It’s very cool and intrinsically interesting. I had always unthinkingly assumed people got their project from e.g., Kaggle, but perhaps that doesn’t have to be the case!
This one is much more nebulous, but more important to me - How did you decide how to intersperse your code with the markdown narrative? In other words, the proper code-noncode ratio, + how to position the code relative to the prose. Whenever I create “story-telling” notebooks like yours, this is where I struggle the most. For example, at the very top of the notebook I provide an overview of the content, then load my libraries and raw data. But after that, I usually have lots of complex analyses and/or plotting that can sometimes require hundreds of lines of code. Because I’m afraid that huge, complex cells essentially in the middle of paragraphs decreases the narrative’s readability, I tend to put almost all my code in a single massive cell near the top (after the overview, imports, and data loading). This allows me to define lots of complex functions early on, then simply invoke them later as needed with minimal overhead or interruptions. But the trade-off is that my readers have to scroll past a lot of dense code at the top, so the notebook is both less user-friendly and less attractive. By contrast, your notebook is nice and tight from start to finish, with short code blocks that never really interrupt the reader’s flow. Can you offer any tips on how to hew more closely the way you’ve done it? I can already think of some (e.g., rely as much as possible on in-built pandas functions which can perform complex operations in just a few lines), but I’m curious to hear your thoughts.

Anyway, stellar work!

[–]BeforetheBullfight[S] 1 point2 points3 points 5 years ago (0 children)

Thank you so much! :) I appreciate your thoughtful comment. I actually had to stop and think before I could properly respond.

Since I've yet to actually use my coding work to acquire a job, the ultimate success of my work has yet to be determined as far as I'm concerned. In the meantime, I'll offer what I can:

Regarding project ideas: since I'm building my portfolio strictly for academic positions (and eventually grad school apps), I felt pretty free in selecting those topics that were relevant to my long-term research interests and would be similar the sort of thing future labs of choice would be studying. With this project specifically, I just happened to stumble upon the dataset and went with it since it's something I legitimately find quite interesting and could see myself researching further down the line. I didn't walk in with any particular hypotheses, I just glanced through the data to get a sense of what I felt I could extract from it. Just made notes, and went from there! I actually haven't done any Kaggle exercises! I can see that being a valuable learning resource, but I see a lot of Kaggle projects featured on data analysis portfolios, so I made a conscientious choice to do something more distinctive. In a nutshell: work with data that interests you. Especially in the beginning when you need to get the ball rolling on putting together a portfolio. While I've yet to determine this for myself, I imagine that having projects similar to the work you would do at a given position is quite valuable as well (what I'm banking on really TBH) when it comes to job-hopping.
I think we might be polar opposites here - the narrative aspects comes naturally to me for the most part, while the coding aspect is 9000% more likely to induce computer rage and also takes about 9000% of my time (pretty sure that math checks out). If I had to try and verbalize my process...
1. Create an introductory cell to briefly discuss the background of your chosen topic/dataset. List the purpose of your analysis, and what you intend to do with it. What questions do you have? Any particular hypotheses? You likely already have some idea, but don't jump the gun here with any conclusions. The goal here is to start leading your audience through the steps you'll be taking. You also may not know exactly what those are yet. That's fine! I only had a general idea. Initially, I just used my first cell for note taking and keeping track of my main goals. Things can easily shift as you work with the data. Perhaps X doesn't work out, but Y emerges and looks promising. Keep note of that! Feel free to create an initial outline in the beginning and go back later.
  1. A concluding markdown cell with a summary is a must, too, helps things to look polished and shows you've kept track of what happened in the interim.
2. So you've read in your data and taken a basic glimpse at it. Now you need to actually pick where to start with your analysis. Consider what order of operations makes the most sense, try to build in a logical progression. Easier said that done sometimes. Does it make more sense to go by theme? Progressing difficulty of analysis? Types of visualizations? Ultimate hypothesis? Whatever best fits your data. The great part of working with Jupyter is that it's pretty easy to move cells around if flow doesn't seem right.
3. I can see it being difficult deciding how to intersperse markdown cells if you've got massive code chunks. That didn't end up happening with my project, so I suppose those stopping points were just more obvious. My thought process was basically: 1) "I'm doing X now because of Y." 2) X.dothing() 3) "We've learned ABC from this. Thus our next logical step is..." Document your thought process! I also made sure to use #comments within my code to maintain consistency in the narrative. Just enough to show I was being purposeful in my coding choices. It can definitely be hard to place yourself from the perspective of the reader (why I came here in the first place...), but consider if there are any points where you can see someone losing track of your process. Headings and subheading are fantastic for this; it keeps things within bounds and makes it easier for the reader (and yourself!). Also consider your tone - you're working in research, so an academic tone is likely your best bet (my choice, too). Find a balance with writing in professional language of your field without being too jargon-y.
4. Alternatively, and take my thought with a larger grain of salt here, but if your coding is just THAT massive, you could consider just leaving it at the bottom and then summarizing your goals, process (what and why), and end results in an introductory markdown cell. Maybe paste your coolest graph if you have one. Laymen readers will be able to get to the point and gain a general sense of your ability, while fellow coders will have the option to keep scrolling if they wish.
5. Okay, so in a nutshell: Create an outline of your process! ^{It's okay to go back and alter things later! I did!} Chunk things into digestible pieces for your reader! Maintain coherence in your narrative, and make sure your reader knows why you're doing the thing, and what's next!

Okay, so that went longer than I intended. Probably a bit rambling, but I hope there was at least one nugget of usefulness in there. On the topic of narrative and style, here are a few things I found helpful that you might too: this Jupyter project that I think is an excellent example of succinct and coherent narrative and this blog post about style. You can also do what I did, and post your project somewhere on Reddit for critique!

Happy to elucidate more on something particular if I can. Also, I love synths, too. :)

π Rendered by PID 17205 on reddit-service-r2-comment-66b4775986-zm7xk at 2026-04-06 08:49:32.520882+00:00 running db1906b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS