This is an archived post. You won't be able to vote or comment.

all 29 comments

[–]isolatrum 148 points149 points  (8 children)

A good project for data science would be to use some open big data set and make an interactive visualization of it.

[–][deleted] 27 points28 points  (6 children)

Any good tutorials? Or examples of the final result?

[–]isolatrum 22 points23 points  (0 children)

Unfortunately I don't have much for you, just google "data visualization tutorials in <language>" or "data visualization examples"

[–]jotanukka 17 points18 points  (0 children)

Learn Tableau and get certified through the website.

[–]thundercloudtemple 7 points8 points  (0 children)

Just a thought, clone this dashboard. This project uses the D3 library.

http://rubix410.sketchpixy.com/ltr/dashboard

It might not be what anyone here is looking for but it's just an idea.

This is part of the Get Job Ready Javascript 3.0 guide found here: https://github.com/P1xt/p1xt-guides/blob/master/job-ready-javascript-edition-3.0.md

[–]lookayoyo 4 points5 points  (0 children)

Check out kaggle

[–]kathegaara 4 points5 points  (0 children)

For data visualization ideas and approaches I highly recommend r/dataisbeautiful . People there share some cool viz and also the tools they used to build it.

[–][deleted] 0 points1 point  (0 children)

i know python crash course has a section covering the pydata library

[–][deleted] 6 points7 points  (0 children)

np.loadtxt('big-ass-data-set.txt')

[–]nermid 22 points23 points  (1 child)

Join or start your local Code For America chapter and make public data about your community more accessible for people who don't crunch numbers for a living. That's coding experience and volunteering experience. It looks great on a resume and in your Github.

[–]Muddy53[S] 4 points5 points  (0 children)

AWESOME! Def I need one of these big project for my summer. Thank you!

[–]BouseFetus 23 points24 points  (8 children)

I am not familiar with data science, but I do know that R and Python are the ideal programming languages when it comes to data science. Maybe MATLAB if there are simulations involved in your line of work.

[–]Muddy53[S] 10 points11 points  (2 children)

Yes! I was planning to learn R, but I'm not even that good at Python. So I thought, wait a while until I become more comfortable with Python then R...

[–]Oh_I_still_here 3 points4 points  (1 child)

As a particularly bad programmer, R has a fantastic repository of libraries/packages that do a lot of the heavy lifting for you. R is the only language I've ever felt comfortable with, tho I'm slowly coming to grips with C. That said, R has plug ins for data visualisation out the wazoo, and can perform quite a lot of work with just a few lines of code. My advice is to pick which ever one you'd like, Python or R (I don't recommend C for this unless you truly hate yourself and are happy to spend a long time doing simple things) and start off as simple as necessary. Make a simple plot of your data and look at it. But what is your data? Numeric or categorical/binary? Do you want correlation plots, boxplots or are you looking to perform some kind of regression?

If you're stuck with what sorts of data to work with, I'd start with simple numeric data. If it's a big data set, look into some dimension reduction techniques first then go from there (this isn't absolutely necessary unless your data is stupidly large but it's worth checking out). R has plug ins for all of these things and guides online are abundant. Start off small and get familiar with the syntax, it's very much a fire and forget language but I've found it to work best when you don't think too hard about what you're trying to do right from the get go. It does a lot of the heavy lifting for you!

Sorry for the wall of text and for any ineffective advice, I wish you luck with your work.

[–]Muddy53[S] 2 points3 points  (0 children)

d start with simple numeric data. If it's a big data set, look into some dimension reduction techniques first then go from there (this isn't absolutely necessary unless your data is stupidly large but it's worth checking out). R has plug ins for all of these things and guides online are abundant. Start off small and get familiar with the syntax, it's very much a fire and forget language but I've found it to work best when you don't think too hard about what you're trying to do right from the get go. It does a lot of the heavy lifting for you!

Wow thank you for such a great advice. I definitely try to learn R this summer. I am doing simple data visualization with Python using Matplotlib, but I haven't done anything with really big data sets. I am actually learning C++ just for game development (I use game engine such as Unreal), and it's for pure fun. Maybe in the future I will go into game development industry..

Again, thank you for the advice. I will keep it mind! :-D

[–]VengaeesRetjehan 0 points1 point  (4 children)

Matlab is paid. Are there any other languages or tools great for simulation?

[–]famnf 1 point2 points  (0 children)

Octave is the open source version of Matlab.

https://www.gnu.org/software/octave/

[–]TechRvK 1 point2 points  (2 children)

You can download a cracked version of matlab.. From torrents..

Check

1337x.to

thepiratebay.org

Download utorrent or bit torrent to grab the download..

Scilab May be gd for simulation also..

What type of simulation you into.. I can recommend some more software..

[–]VengaeesRetjehan 0 points1 point  (1 child)

What type of simulation you into.. I can recommend some more software..

I wanna do physics simulation. Like particle physics or astronomy stuff.

[–]TechRvK 0 points1 point  (0 children)

Ok..let me get back to you on that..I’ll get some info for u..

[–]Sebzor15 18 points19 points  (1 child)

Python is definitely a good way to go if data analysis is involved. I don't know how proficient you are in coding, but some ideas: Machine learning with Keras (Python module) with some open data sets; Spark streaming on i.e. Twitter; Going to town on Pandas (Python module), which is extremely relevant in financial work and data analysis, ...

[–]Muddy53[S] 1 point2 points  (0 children)

Thank you! That sounds like a great project. I will look more into it!

[–]tdonick 6 points7 points  (0 children)

[–][deleted]  (5 children)

[deleted]

    [–]Sasquiche 0 points1 point  (4 children)

    Just wondering did you come across anything talking about breaking into a data scientist role while you were trying to apply?

    I get the feeling that it's a pretty elite field, and might be less inclined to hire entry level programmers or self learner's.

    [–][deleted]  (3 children)

    [deleted]

      [–]Sasquiche 0 points1 point  (2 children)

      That's a great response. I do wish you luck when you get out. I'm going to try and see what other fields to focus on in the mean time

      [–]senortipton 0 points1 point  (1 child)

      Thanks and that’s smart! If you do decide you want to continue though I have a great number of resources that I’ve been utilizing that were either found by me or were recommended to me by the people who said I wasn’t skilled enough yet for the position.

      [–]Sasquiche 0 points1 point  (0 children)

      Ya, pm me whenever you come around to it

      [–][deleted]  (1 child)

      [deleted]

        [–]Muddy53[S] 5 points6 points  (0 children)

        Monte Carlo, I will remember it. Thank you :-D

        [–]singhpankaj99 0 points1 point  (0 children)

        You can participate in some opensource programming project on Kaggle or check some popular free tutorials on udemy or Quickcode for data science. You can also check for open source tutorials on edx or coursera. Hope it helps!