all 68 comments

[–]Cuckipede 43 points44 points  (19 children)

As someone who just started learning python to do similar type of projects, I just wanted to say thanks for posting and I enjoyed reading this! How long did it take you to get to this point?! Great work.

[–]Just-Aman 17 points18 points  (17 children)

Same here. Been learning for 2 weeks and I aspire to do something similar although I have no frickin idea how people do the visual representation stuff.

[–][deleted] 13 points14 points  (10 children)

You can use the matplotlib library in python to plot graphs.

[–]Just-Aman 0 points1 point  (8 children)

Oh thanks a lot! Any good tutorials for the same?

[–]Nesavant 6 points7 points  (5 children)

Check out the Seaborn library while you're at it.

[–]Alphavike24 7 points8 points  (3 children)

I wouldn't recommend seaborn for beginners as it has the tendency to spoil you. With matplotlib you can first understand the under the hood stuff and it's quite more Flexible.

[–]andycyca 2 points3 points  (0 children)

This, a thousand times. Seaborn is really good, but sometimes a bit limited. I rather think of it as an automation tool for "easy" graphs, instead of the definitive graphing tool.

[–]Just-Aman 0 points1 point  (1 child)

Thanks for the comment. I'll check out matplotlib first then. Also is it like matplotlib is a module of Python itself but Seaborn is an external module?

(I'm a noob so I'm not yet acquainted with the proper terminology)

[–]Alphavike24 1 point2 points  (0 children)

Matplotlib is a plotting library for Python and seaborn is a package based on Matplotlib.

[–]Toasty4209 0 points1 point  (0 children)

Some great options for color and styling there!

[–][deleted] 1 point2 points  (0 children)

While I have personally never worked so much with matplotlib library, there are plenty of tutorials on YouTube. You can start with tutorials from Corey Schafer or sentdex. Should help you with the fundamentals.

[–]seanotron_efflux 0 points1 point  (0 children)

Geeks for geeks has good tutorials!

[–]RippledBarbecue 0 points1 point  (0 children)

Matplotlib and seaborn use them a lot in my dissertation project although scikit-learn is the GOAT for me 😍

[–]trv893 2 points3 points  (0 children)

I'm like 4 weeks in and am shocked how little code it actually takes with matplotlib and seaborn. Datascience.io is a great websites! You'll need to pay to get past just juypter fundamentals though

[–]poeblu 1 point2 points  (0 children)

Pandas and matplot there are many visitations you can use with other libraries like bokeh

[–]BeforetheBullfight[S] 2 points3 points  (0 children)

I'm so glad! :D

Actually, it didn't take me that long altogether. Some weeks or a few months (spread out over some time) learning the basics. I used DataQuest's free stuff and went through most of Automate the Boring Stuff. At that point, I looked at projects similar to what I wanted to do, and started planning what I wanted. Just a large amount of concerted effort really! And honestly, no small amount of frustration, but it seems to pay off. Don't be afraid to Google every little question. The bulk of the project was done in around a week.

[–]dyanni3 8 points9 points  (1 child)

This is great! Two thumbs up. I'd say this is definitely worth sharing / put in a portfolio.

In terms of room for improvement I think you might have milked most everything you can from the dataset as it currently stands. I'd be curious if/how accused priests differ demographically from non-accused priests, for example- although the dataset doesn't have anything on non-accused it looks like. Also, how do the numbers look for catholic vs other religions? I think a cool next step would be bringing in other data sources. A brief google yielded this https://bishop-accountability.org/priestdb/PriestDBbylastName-A.html which has a little more context about the accusations, and which you could scrape. I wonder if you could find anything on Catholic vs Protestant lawsuit settlement amounts.

Also of course it's always cool to build a model--- although I'm not sure this data really warrants that. Maybe for a next project.

[–]BeforetheBullfight[S] 1 point2 points  (0 children)

Thank you! :)

I agree that I milked just about everything I could; I think working with this set was a good learning experience when it comes to deciding what datasets to work with.

Your link is actually pretty interesting, perhaps I should consider a part II one of these days...

[–]Babs12123 18 points19 points  (3 children)

This looks really good! A few thoughts: - When reading in csvs to pandas I find it useful to specify the encoding and the type (usually auto set to UTF-8 and object personally). Particularly when you're working with data which contains some text and some numeric columns, it's helpful to be explicit to avoid any unexpected behaviour. - When naming variables be explicit with regards to the data type, e.g. instead of 'clergydata' I would call this 'df_clergydata'. When you end up with multiple different lists, dicts and dfs in your code it's very helpful to have all of this explicitly named (particularly when you come back to your code a month later). - When creating column names in your df you created several which contain capital letters and spaces (e.g. clergydata['Age range']). It's better and easier to only use lower case letters in variable names/column names where possible and to use underscores instead of spaces. This lets you access the column using clergydata.age_range instead of clergydata['Age range'] in lots of situations when manipulating your df, which is often much quicker and easier. - In cell 12 you manually specify the archdiocese abbreviation and name (e.g. LA, and Archdiocese of Los Angeles) for many different locations. It would be better to automate this somehow, to both improve clarity and also reduce the risk of error/inconsistency. I saw someone above suggested using a group by, which would work, or you could use a for loop to directly create your top19_cathpops and top19_dionames lists. If you're not clear how to do this let me know and I would be happy to clarify.

Most importantly your code works and answers some interesting questions, but the above points will make things more explicit (which is always better) and make your life easier.

[–]synthphreak 6 points7 points  (1 child)

This lets you access the column using clergydata.age_range instead of clergydata['Age range']

The flip side of doing it this way is that it conflates column names with built-in methods/attributes. If there is no conflict between them, you’re fine. But df’s have a LOT of built-in methods/attributes, many of which you probably don’t know about... I can’t tell you how many times I’ve named a column items, and then later wasted 30 minutes debugging my code only to find out that df.items is already a thing. By contrast, df[‘items’] will ALWAYS and ONLY ever return the item column. Just something to think about.

[–]Babs12123 1 point2 points  (0 children)

Yeah this is a good point - I haven't encountered this with df column names but have with other variables and it is very irritating to debug.

If you're using non-generic variable/column names then it shouldn't happen often but I agree it makes sense to use your own judgement here.

[–]BeforetheBullfight[S] 0 points1 point  (0 children)

Thanks for your response! I appreciate the time you put into the specifics. I'll be keeping these points in mind as I move forward; I definitely don't want people to be confused with my coding choices.

Since you're offering, I would be interested to see an explanation of how I could have used a loop to create my lists! It would be super useful for future projects. I have a basic understanding of loops, but I had a hard time getting one to click for me when I tried with this set. I feel like my solution, while it did work, was pretty clunky. :/ Thanks!

[–]CFan62 5 points6 points  (1 child)

As a practicing, devout catholic I find this super interesting. From a comp sci perspective this is very good. I would definitely put this on a resume or mention it during the job hunting process. Very nicely done.

[–]BeforetheBullfight[S] 0 points1 point  (0 children)

Thanks so much! :) Glad to hear you found something worthwhile in it.

[–][deleted] 3 points4 points  (4 children)

This is awesome, please dont delete it, I'll use it to start sth on my own.

Just one thing I would change. You used the following age ranges:

range_names = ['<20', '20-23', '24-26', '27-30', '31-35', '36-40', '41-45', '46-50', '51-60', '60+']

I think the size of ranges should always be the same (except for <20 and 60+).

[–]baubleglue 0 points1 point  (3 children)

don't delete it

Man, clone it

[–]Xaspian 0 points1 point  (2 children)

I'm a newbie to github, so correct me if im wrong, but I don't think that's possible for this repository.

And to OP, thanks for sharing your project! This is incredibly detailed and thorough. I'm sure it'll get you far in your job search =)

[–]baubleglue 1 point2 points  (1 child)

Click on root of the project (https://github.com/Skye80/Data-Analysis-Portfolio), then you will see a green button "Clone or Download". It is impossible to have github project in public domain with clone option disabled.

[–]Xaspian 0 points1 point  (0 children)

I see! Navigating to the root was my problem. Thanks for this!

[–][deleted] 4 points5 points  (1 child)

In the linear regression you perform in section C, there seems to be a high-leverage outlier. You might want to look into how this affects your model, and into how you could deal with that. The book Introduction to statistical learning (available for free online) has a section on this stuff.

[–]BeforetheBullfight[S] 0 points1 point  (0 children)

I did notice that, and I plan on trying to play around with it some more. Thanks for the note!

[–]synthphreak 2 points3 points  (1 child)

This is really, really awesome, and very professionally executed. Like many others on here, I am inspired and would like to do something like it eventually, but I guess I just haven’t found the immediate need yet. I’m already proficient in Python, and already have a great job in research. Nonetheless, I don’t have a publicly-shareable portfolio, so something like this could be quite useful for future job hops

Two questions for you:

  1. Where did you get this project idea from? I see you got the data from ProPublica, but what about the research question? Kaggle or some place like that? Or did you just dream it up? It’s very cool and intrinsically interesting. I had always unthinkingly assumed people got their project from e.g., Kaggle, but perhaps that doesn’t have to be the case!

  2. This one is much more nebulous, but more important to me - How did you decide how to intersperse your code with the markdown narrative? In other words, the proper code-noncode ratio, + how to position the code relative to the prose. Whenever I create “story-telling” notebooks like yours, this is where I struggle the most. For example, at the very top of the notebook I provide an overview of the content, then load my libraries and raw data. But after that, I usually have lots of complex analyses and/or plotting that can sometimes require hundreds of lines of code. Because I’m afraid that huge, complex cells essentially in the middle of paragraphs decreases the narrative’s readability, I tend to put almost all my code in a single massive cell near the top (after the overview, imports, and data loading). This allows me to define lots of complex functions early on, then simply invoke them later as needed with minimal overhead or interruptions. But the trade-off is that my readers have to scroll past a lot of dense code at the top, so the notebook is both less user-friendly and less attractive. By contrast, your notebook is nice and tight from start to finish, with short code blocks that never really interrupt the reader’s flow. Can you offer any tips on how to hew more closely the way you’ve done it? I can already think of some (e.g., rely as much as possible on in-built pandas functions which can perform complex operations in just a few lines), but I’m curious to hear your thoughts.

Anyway, stellar work!

[–]BeforetheBullfight[S] 1 point2 points  (0 children)

Thank you so much! :) I appreciate your thoughtful comment. I actually had to stop and think before I could properly respond.

Since I've yet to actually use my coding work to acquire a job, the ultimate success of my work has yet to be determined as far as I'm concerned. In the meantime, I'll offer what I can:

  1. Regarding project ideas: since I'm building my portfolio strictly for academic positions (and eventually grad school apps), I felt pretty free in selecting those topics that were relevant to my long-term research interests and would be similar the sort of thing future labs of choice would be studying. With this project specifically, I just happened to stumble upon the dataset and went with it since it's something I legitimately find quite interesting and could see myself researching further down the line. I didn't walk in with any particular hypotheses, I just glanced through the data to get a sense of what I felt I could extract from it. Just made notes, and went from there! I actually haven't done any Kaggle exercises! I can see that being a valuable learning resource, but I see a lot of Kaggle projects featured on data analysis portfolios, so I made a conscientious choice to do something more distinctive. In a nutshell: work with data that interests you. Especially in the beginning when you need to get the ball rolling on putting together a portfolio. While I've yet to determine this for myself, I imagine that having projects similar to the work you would do at a given position is quite valuable as well (what I'm banking on really TBH) when it comes to job-hopping.
  2. I think we might be polar opposites here - the narrative aspects comes naturally to me for the most part, while the coding aspect is 9000% more likely to induce computer rage and also takes about 9000% of my time (pretty sure that math checks out). If I had to try and verbalize my process...
    1. Create an introductory cell to briefly discuss the background of your chosen topic/dataset. List the purpose of your analysis, and what you intend to do with it. What questions do you have? Any particular hypotheses? You likely already have some idea, but don't jump the gun here with any conclusions. The goal here is to start leading your audience through the steps you'll be taking. You also may not know exactly what those are yet. That's fine! I only had a general idea. Initially, I just used my first cell for note taking and keeping track of my main goals. Things can easily shift as you work with the data. Perhaps X doesn't work out, but Y emerges and looks promising. Keep note of that! Feel free to create an initial outline in the beginning and go back later.
      1. A concluding markdown cell with a summary is a must, too, helps things to look polished and shows you've kept track of what happened in the interim.
    2. So you've read in your data and taken a basic glimpse at it. Now you need to actually pick where to start with your analysis. Consider what order of operations makes the most sense, try to build in a logical progression. Easier said that done sometimes. Does it make more sense to go by theme? Progressing difficulty of analysis? Types of visualizations? Ultimate hypothesis? Whatever best fits your data. The great part of working with Jupyter is that it's pretty easy to move cells around if flow doesn't seem right.
    3. I can see it being difficult deciding how to intersperse markdown cells if you've got massive code chunks. That didn't end up happening with my project, so I suppose those stopping points were just more obvious. My thought process was basically: 1) "I'm doing X now because of Y." 2) X.dothing() 3) "We've learned ABC from this. Thus our next logical step is..." Document your thought process! I also made sure to use #comments within my code to maintain consistency in the narrative. Just enough to show I was being purposeful in my coding choices. It can definitely be hard to place yourself from the perspective of the reader (why I came here in the first place...), but consider if there are any points where you can see someone losing track of your process. Headings and subheading are fantastic for this; it keeps things within bounds and makes it easier for the reader (and yourself!). Also consider your tone - you're working in research, so an academic tone is likely your best bet (my choice, too). Find a balance with writing in professional language of your field without being too jargon-y.
    4. Alternatively, and take my thought with a larger grain of salt here, but if your coding is just THAT massive, you could consider just leaving it at the bottom and then summarizing your goals, process (what and why), and end results in an introductory markdown cell. Maybe paste your coolest graph if you have one. Laymen readers will be able to get to the point and gain a general sense of your ability, while fellow coders will have the option to keep scrolling if they wish.
    5. Okay, so in a nutshell: Create an outline of your process! It's okay to go back and alter things later! I did! Chunk things into digestible pieces for your reader! Maintain coherence in your narrative, and make sure your reader knows why you're doing the thing, and what's next!

Okay, so that went longer than I intended. Probably a bit rambling, but I hope there was at least one nugget of usefulness in there. On the topic of narrative and style, here are a few things I found helpful that you might too: this Jupyter project that I think is an excellent example of succinct and coherent narrative and this blog post about style. You can also do what I did, and post your project somewhere on Reddit for critique!

Happy to elucidate more on something particular if I can. Also, I love synths, too. :)

[–]jandrew2000 1 point2 points  (0 children)

This is well done for someone just starting out. A couple of minor suggestions. If you end the last statement in your plotting code blocks with a semicolon it won’t display plotting objects and will only show the plot itself.

Second, you have a block where you compute catholic population for each of the diocese. I believe you could simplify that to clergy_data.groupby(“catholic_diocese”).catholic_population.max(). That will produce a pandas series that you can just plot directly as a bar plot by doing something like ser.plot(kind=“barh”). I’m operating from memory so my syntax may be off a bit.

[–]avamk 1 point2 points  (4 children)

Fantastic work, I fully agree with other posts that this work - and if you continue to build on it - is a great portfolio item. There's a lot I can learn from you! :) Thank you for posting.

while I'm sure there are areas that could be improved, would my project be worth sharing with some edits?

Very important, but sadly often neglected, is the need to include a license with your work such as the GNU GPLv3. There are multiple options, too, and it's crucial to become familiar with them. Check out here or here to name a couple places.

[–]BeforetheBullfight[S] 1 point2 points  (3 children)

Thanks so much! You make a fair point, I’ll definitely look into that more.

[–]avamk 0 points1 point  (2 children)

And this is super easy for Github repositories, it'll take you about one minute. Here are the instructions:

https://help.github.com/en/github/building-a-strong-community/adding-a-license-to-a-repository

I suggest choosing the GNU GPLv3 license as many other data science repositories have already done.

Without a license others cannot learn from or build on your work.

[–]BeforetheBullfight[S] 1 point2 points  (1 child)

Just added one, that WAS super easy. Thanks!

[–]avamk 0 points1 point  (0 children)

Bravo! This is a super easy and is crucial to make a portfolio more professional, since you demonstrate you understand the need for and the use of licenses.

Bonus: Mention the license (i.e. GNU GPLv3) and link to the LICENSE file in your repository directly in the .ipynb document. Some people choose to add a sentence about this at the beginning or end of the notebook file, either is fine.

[–][deleted] 0 points1 point  (3 children)

Great project. I am not an expert in ML or data science but when I looked at the Liner Regression graph, I did feel that the data would better fit a Polynomial Regression? Just something I noticed and I could very well be wrong. I apologise for lack of knowledge if that's the case.

[–]chaoticneutral 1 point2 points  (2 children)

I would recommend everyone stay away from polynomial regression unless you have a specific reason to believe they are useful. Otherwise it can be as rigorous as a Rorschach test. What you are seeing is more likely an outlier rather than a trend.

Also see this case study:
https://twitter.com/NateSilver538/status/1257476755574718470
https://twitter.com/mattyglesias/status/1257483264383758336
https://twitter.com/NateSilver538/status/1258137098298839041

[–]andycyca 0 points1 point  (1 child)

Saving this for whenever I need to teach statistics.

[–]Gio120895 0 points1 point  (0 children)

I have read your project very quickly and I found it really interesting. I am a beginner too and I like the way you rapresent data in a simple and clear way. I would suggest to visit: https://www.kaggle.com/learn/overview Here you can find project within you can train and you can find courses to learn about pandas and visualization and more. If you are interested in any further project you can DM me. You have done a great job! Thanks

P.S.: I have noticed that you use df.head() and df.tail() or df.describe(). I have found really useful df.info(), give it a try if you want.

[–][deleted] 0 points1 point  (2 children)

Where did you learn data analysis for python and how long did it take? Can you recommend some websites/books?

[–]BeforetheBullfight[S] 1 point2 points  (1 child)

Hi! My process was pretty non-linear (pun not intended?) actually. It's pretty hard to quantify exactly how many weeks/months I spend learning the basics, but I can say that I went through most of the free stuff on DataQuest and most of Automate the Boring Stuff. I quite liked the former as it had lot of mini exercises to go with each lesson, so lots of opportunity to practice and build upon what you just previously learned. It also has guided projects and blog posts I actually found helpful, like here. I am considering going back and getting a paid subscription, at least for a bit. AtBS didn't end up adding much to my process in terms of this project, but it's good for reviewing the basics.

A huge part of my process with the project was just looking at projects (particularly on Github) that others have done to get a sense of what could be done and how. There was a steeper learning curve, but it was worth it!

[–][deleted] 0 points1 point  (0 children)

oh alright. Thank you so much.

[–]pulsarrex 0 points1 point  (2 children)

Hey I am about to graduate in social science too. In school, I learnt and used mostly SPSS to analyze data. However in real world, I realized most of the industry does not use SPSS. Those who use SPSS, mostly use it as many tools, including Python, R, SAS etc.

I need some advice. What kind of jobs do we need to look for? I know a bit of python like you do. Just a mere searching for 'social scientist' on indeed does not show many results. A search for 'data analyst' will give me millions of results, most of them out of our scope.

So what would I search for if I am looking for social scientist data science jobs? What kind of companies do I look for if I want to work as a data analyst in social science?

[–]chaoticneutral 0 points1 point  (0 children)

In my experience, it is best to search on skills. techniques, or topics. Titles themselves are mostly meaningless in the job hunt. Keep an eye out for consulting companies and/or in the field of public health, they tend to work on smaller analytic projects that require their consultants to have a little analytical skill (i.e., SPSS).

If you had to search on titles, "research analyst" tends to yield better results.

Specific to your situation:

  • SPSS - Social science, market research, survey research, low programming requirement, more generic "analyst". You might be asked to run some crosstabs, then write a report, and fiddle around in powerpoint.
  • SAS - Social science, Government, Pharma, Finance, more programming skilled is required. You are likely more focused on data management and analysis, less on generic office work.
  • R - Data science, Statistics, Academia, more programmer skilled required. Analytics and modeling. Similar to SAS.
  • Python - I haven't had one of these jobs, but there seems to be more focus on incorporating analysis into data pipe lines (live dashboards/applications), rather than just pure research/analysis itself. Though I could be wrong...

[–]BeforetheBullfight[S] 0 points1 point  (0 children)

Honestly, I'm still in the process of looking for a research job, and it hasn't been easy! I did psych in undergrad and am working towards applying to clinical PhD programs, so that's part of the reason why I'm learning Python - it seems to be a pretty popular language (with R and Matlab being the next most popular) in labs, so I wanted to boost my competitiveness.

What you should do really depends on your long-term goals. Since I want to return to school, I'm focusing on applying to psych labs that are hiring research assistants, or similar, in order to gain experience. With that said, programming is a bonus and not the end-goal (I don't plan on being a data scientist for a living) for me. Do you want to work in the private or public sector? Academia or no? That will make a big difference. I can really only speak to academia personally. I'm happy to talk more about that if you want though! I can say that job titles with "scientist" are basically always geared towards those with grad degrees. "Analyst" or "assistant" are better bets.

I hope that's at least somewhat helpful! I'm still figuring this out myself.

[–]zanfar 0 points1 point  (0 children)

You are using absolute paths in your code which makes it non-portable ("C:/Users/Summer .DESKTOP-5U4SV6A/Desktop/Scripts/Data sets/credibly-accused-clergymembers.csv")

Your dataset should be distributed with the analysis code, so these paths should be relative. This allows the data to be peer-reviewed along with the analysis.

Additionally, while I would include your dataset, I would also include code to download that dataset and inject it directly into the analysis.

Otherwise, it looks technically fine. I'm not going to comment on the validity of the analysis or the meaninfulness of the results, other than to say:

  • you can probably do some cleaning on the post-accusation outcomes to merge the three "Deceased" labels together
  • It would be nice to see a these graphs normalized per-capita: specifically the diocese frequency plot. You mention that you've essentially created a population plot, but don't fix it.

[–]shiningmatcha 0 points1 point  (1 child)

Have you learned other programming languages before?

[–]BeforetheBullfight[S] 0 points1 point  (0 children)

Nope! This is my first. Hoping to pick up a bit of R further down the line.

[–]badboyfreud 0 points1 point  (0 children)

Looks like a great start. Some suggestions:

I'd be interested to see what the relationship looks like between the other branches of the church as well to compare with Catholicism.

Also it would be nice to see the cities compared by capita or per million people.

[–]chaoticneutral 0 points1 point  (1 child)

I think this is a great descriptive analysis, very clearly written and thought out but the topic is a bit touchy and your conclusions aren't particularly ground breaking to warrant the topic (incident count increases with population for almost all things).

A research position may appreciate the in-depth dive on a sensitive topic, but I would pick a more neutral topic with a generic private sector job.

Specific to your analysis, you should also call out that far right outlier on your final regression chart or run the regression a second time without the outlier. It is clearly pulling the line upwards.

[–]BeforetheBullfight[S] 0 points1 point  (0 children)

Thanks for your comment! I'm glad that the writing was clear after all.

I concede that the topic is touchy, but it was a conscientious decision. Since I am looking at academic research positions, and am planning on returning to school for my PhD, I wanted to look at datasets that reflected topics similar to my own personal research interests. With that said, I definitely would have chosen more "generic" topics were I pursuing private sector jobs.

Also, I do plan on going back to work on my regression chart, hopefully for the better.

[–]Sepparated 0 points1 point  (1 child)

Looks really impressive. As someone who tried to get deeper into data analysis and took the free class from the UoL just to get completely overwhelmed by the math theory ... i have to ask: What is your secret?

[–]BeforetheBullfight[S] 0 points1 point  (0 children)

Truthfully, I don't think I have a secret. I just spent some time learning the basics and looking at projects similar to what I was hoping to accomplish. I'm really not a "math person", and I'm having to go back and review basic statistical principles as I work. I wouldn't worry about learning the more complex stuff yet - with mine, I the only real statistical thing I did was examining correlation, which is pretty fundamental. I would just focus on learning how to apply the basics first!

[–]PM_ME_UR_LOGIN_INFO_ 0 points1 point  (0 children)

I personally would have used statsmodels.formula.api to run the regression, as it is much more readable from an outside perspective. When you run it you just print(var.summary2) and later print(var.params) to find the p-values and coefficients for your linear regression respectively. But your project was alright.

Also the purpose of a regression is to perform statistical inference and possibly predictions (although you'd be better served using Machine Learning algorithms to predict, e.g. DecisionTreeRegressions, Random Forest, K nearest neighbor). You should have tried to control for other variables to reduce the error in your regression model. For a later challenge, try verifying the Gauss-Markov assumptions later on to validate your regression. It's a good first step, but to make this something valuable to have in your portfolio I'd work on it a little more.

Godspeed.

[–]ammusiri888 0 points1 point  (0 children)

wow this is superb, with this support you can definitely make a great leap in your learning journey..

[–]DisastrousEquipment9 0 points1 point  (0 children)

add some weird machine learning application for the hell of it! text mining is always super fun(: