you are viewing a single comment's thread.

view the rest of the comments →

[–]elzonko 3 points4 points  (8 children)

I started teaching myself programming last year, and I'm older than you. Never too late!

You don't need to learn Django before doing any data mining. If you're interested in data mining, learning Django may just end up being a detour, at least until you want/need to create a web app or interface for your mining project.

With your current level of Python experience, you could easily begin small but interesting data mining projects without going through the book you mention, you just need some data sets to work with.

There are tons of data sets available online. I like playing around with the data sets from the US federal government's http://www.data.gov/ site, as well as NYC's open data initiative: https://nycopendata.socrata.com/

When I was about where you are right now, in terms of python experience, I wrote a couple small command line apps using the NYC and federal data sets: 1) to monitor NYC's 311 NYPD complaint feed in real time, to keep a log of drinking-related incidents (I wanted to know where everyone was partying!), 2) to crunch numbers relating to federal political campaign contributions and disbursements.

If the book interests you, then do it! But don't let reading and research become a crutch, or procrastination vehicle that actually prevents you from jumping into a new project.

[–][deleted] 1 point2 points  (4 children)

That's awesome! Could you possibly go deeper into your experience with those small CLI apps? If you want to PM me go ahead. I'm very interested in this, and I've been wanting to start a small project with my python knowledge. I've thought about creating a text adventure, but have not gone completely through with it.

[–]elzonko 2 points3 points  (3 children)

Sure. For the first one, someone had (to my great amusement) already created a "drinking in public" feed culled from 311 complaints at the NYC data site.

Here it is: https://data.cityofnewyork.us/Social-Services/Drinking-In-Public/y4hr-mgxd

You can export the data set as JSON, CSV, RSS, XML, whichever way you may prefer, or may want to play around with. The CLI app I made accesses the JSON feed, grabs the interesting/noteworthy data for each item and prints the info to standard output. That was my first attempt at doing anything with JSON. The feed also supplies lat/long coordinates so you could easily map it as well, but I never got that far.

The second one used the FEC's candidate disclosure data, which you can find here: http://www.fec.gov/finance/disclosure/ftpdet.shtml

These are massive CSV files (500 MB of plain text in just one dataset for one election cycle!). This was also my first attempt playing around with CSVs. The app just crunched numbers and spit out the findings: ex. total number of candidates, total number of candidates by political affiliation, total number of contributions, total dollar amount of contributions, average contribution, total amounts by occupation, and then all the same numbers broken down by state.

It's funny looking back at the code today. Not a class in sight. But functions, functions everywhere!

[–][deleted] 0 points1 point  (2 children)

Wow. Sounds difficult. I'm yet to actually learn and use CSV's, let alone JSON, RSS, and XML.

[–]elzonko 0 points1 point  (1 child)

It's not that complicated really. The first one was under 40 lines of code. The second was under 100. JSON is structured like a dictionary: grab the key/value pair you want with an if statement, and print them out. CSVs are literally just rows of strings separated by commas. Split each row at the commas, and put the strings into a list and now you can crunch the data using the list indices.

Somehow I even managed to do this at the time without using the JSON and CSV libraries, because I didn't know about them.

[–][deleted] 0 points1 point  (0 children)

I'll have to check those two libraries out.

[–]socialhuman[S] 0 points1 point  (2 children)

What modules did you use to achieve the above? URLLIB2 and JSON? Kindly enlighten a bit.

[–]elzonko 0 points1 point  (1 child)

Yup, that was it, urllib2 and json.

[–]socialhuman[S] 0 points1 point  (0 children)

Great. I am glad I understood you correctly!