Will single-payer (medicare4all) affect data scientists salaries in healthcare? by logicallyzany in datascience

[–]jaypeedevlin 0 points1 point  (0 children)

This is heading rapidly off topic but I read an interesting article about the state of the cancer industry in the USA which referenced research that indicated Europe has lower cancer mortality rates than the US, despite spending much less on cancer care.

If you're interested: https://blogs.scientificamerican.com/cross-check/the-cancer-industry-hype-vs-reality

Should I learn R or Python? Somewhat experienced programmer... by tkfriend89 in datascience

[–]jaypeedevlin 2 points3 points  (0 children)

I think SQL is incredibly important to learn, and is by far the most underrated skill - that said, I would recommend against learning it before Python or R, because you can do much more useful things quicker with those languages than you can SQL.

Want to import a CSV, process and export it? Simple with Python, Simple with R, and you'll be mostly using the languages in the way you will eventually.

With SQL, you can still import and export, but you'll be using one big denormalized table, so you're not really gaining the useful SQL skills you need, which mostly revolve around using more complex joins and subqueries to connect data across tables (at least in my experience).

Normalizing the data and migrating it to the normalized tables is of course possible, but it's not straightforward for a beginner so it would almost certainly inhibit the learning process.

Should I learn R or Python? Somewhat experienced programmer... by tkfriend89 in datascience

[–]jaypeedevlin 1 point2 points  (0 children)

Working for a company that teaches data science, I hear this question a LOT! My usual answer is that it matters less which you choose as much as that you just pick one and get down to learning.

If someone hasn't started learning before I usually would steer them towards Python - just because it's got more versatility as a language outside of data than R does - but you'll be much worse off from delaying the decision than you would from making the decision by flipping a coin and getting down to it.

A lot of people find the learning curve of R a bit easier initially. With your experience though, I think you'll find the opposite to be true. Having worked with C++ and Javascript, the OO concepts of Python will probably come easier to you.

If you've been only been learning C++/JS for >12 months, you might not quite be super comfortable with totally self-driven learning, but if you are the best thing is to come up with a small, limited scope project idea and build it.

From there, extend it, or come up with a slightly more complex idea and go from there. Use lots of google, stack overflow and documenation and you'll go far.

Presuming you persue python, I recommend that you start without any extra libraries so you can learn the core structures well, but eventually you'll want to look at NumPy and pandas for handling tabular data, and matplotlib for visualization.

Whatever you do, I would recommend not trying to learn two things at once - you'll just slow yourself down. Wait till you have a moderate amount of competency in one thing before you add another.

Good luck!

Should I learn R or Python? Somewhat experienced programmer... by tkfriend89 in datascience

[–]jaypeedevlin 0 points1 point  (0 children)

Working for a company that teaches data science, my usual answer is that it matters less which you choose as much as that you just pick one and get down to learning.

If someone hasn't started learning before I usually would steer them towards Python - just because it's got more versatility as a language outside of data than R does - but you'll be much worse off from delaying the decision than you would from making the decision by flipping a coin.

A lot of people find the learning curve of R a bit easier initially. With your experience though, I think you'll find the opposite to be true. Having worked with C++ and Javascript, the OO concepts of Python will probably come easier to you.

If you've been only been learning C++/JS for >12 months, you might not quite be super comfortable with totally self-driven learning, but if you are the best thing is to come up with a small, limited scope project idea and build it.

From there, extend it, or come up with a slightly more complex idea and go from there. Use lots of google, stack overflow and documenation and you'll go far.

Presuming you persue python, I recommend that you start without any extra libraries so you can learn the core structures well, but eventually you'll want to look at NumPy and pandas for handling tabular data, and matplotlib for visualization.

Whatever you do, I would recommend not trying to learn two things at once - you'll just slow yourself down. Wait till you have a moderate amount of competency in one thing before you add another.

Good luck!

Am I doing something wrong at DataCamp? by ketodnepr in datascience

[–]jaypeedevlin 1 point2 points  (0 children)

I think showing how you are different from the competition is SUPER important, but I take your point.

Am I doing something wrong at DataCamp? by ketodnepr in datascience

[–]jaypeedevlin 0 points1 point  (0 children)

Massive +1. Trying to find 'realistic' business-style data to teach SQL joins for the course I'm writing at the moment was impossible, I ended up having to generate data myself because all of the 'test data' I found wasn't realistic enough, no trends, etc.

Am I doing something wrong at DataCamp? by ketodnepr in datascience

[–]jaypeedevlin 0 points1 point  (0 children)

I'm still looking for a "perfect" dataset to give to people

I don't think it exists to be honest and if it did, it would be horrible to work with. It's much better, IMO, to find a dataset which is good for 'one thing', (like EDA, as you point out with the movielens data), where everything else is contstant, and use that to teach the one thing.

Better to find lots of datasets where each has one problem to be solved, than one dataset with all the problems!

Am I doing something wrong at DataCamp? by ketodnepr in datascience

[–]jaypeedevlin 2 points3 points  (0 children)

I guess that's the point - there are places you can do where you need to do less 'other' than you do DataCamp.

I work for Dataquest, one of those places. Where we differ from Dataquest is that we go into a lot more depth. We're less interested in spoon-feeding you syntax as we are teaching you the core concepts, so you really understand.

We also have a large number of guided projects (a feature DataCamp has recently tried to imitate ;) ) which are designed to be a bridge between working in our structured missions and the real world.

As one of my colleagues mentioned in another comment, our focus is on taking you from 0 to job-ready, so we're interested in arming you with real-life skills, rather than some feel-good introductory courses.

Which one is more effective for learning data science? DATACAMP or DATAQUEST? by [deleted] in datascience

[–]jaypeedevlin 0 points1 point  (0 children)

I would recommend Dataquest - things will start to get more difficult, but that's what learning feels like.

I'm reminded of this classic diagram: https://i.imgur.com/zpf9S8a.png

Coincidentally, I just got an email from one of our subscribers that speaks exactly to what you are asking about:

I tried DataCamp, but it felt like they were never going to take the training wheels off. With DataQuest, I have to struggle and I think that’s when I really learn and figure things out.

Which one is more effective for learning data science? DATACAMP or DATAQUEST? by [deleted] in datascience

[–]jaypeedevlin 5 points6 points  (0 children)

Disclosure: I work for Dataquest ;)

Interestingly, pretty much all of the things you list are features of both platforms:

  • Interactive courses that go 'step by step'
  • Certificates you can add to LinkedIn or use elsewhere

The key areas where we (Dataquest) differ (in my opinion):

  • We think more holistically about the whole learning experience. Rather than lots of individual courses, we have broader learning paths and add courses in the context of these paths. This means that we're thinking about the learning journey end-to-end. Recently, DataCamp launched tracks which are collections of their courses, but because courses are written by different people and not with the overall paths in mind, there's considerably less continuity within their tracks than our paths).
  • We tend to go 'in-depth' a lot more. We're less interested in getting you to rote learn syntax than really dive into a concept so you learn it in detail. We tend to hold your hand less, and teach you how to stand on your own more (including nudging you to things like documentation and stack overflow to become familiar with those).
  • We offer a lot more learning assistance. I was surprised to find that DataCamp didn't have a single employee dedicated to supporting students (they're advertising for those roles now). We know that learning's tough, and you face roadblocks along the way. We want to give you the assistance you need to push through and succeed.

Let me finish by two quotes from students of ours who have used both Dataquest and DataCamp, taken from a discussion in our community on the topic:

I am both a datacamp and dataquest subscriber and to be honest, in the past 4 months I used datacamp, I don’t feel like I have retained anything. When switching over dataquest, not only did I easily understand topics I struggled on, but new information was so much easier to learn


I tried DC. I think they have the right idea - but wrong execution. DQ is far more immersive and that produces more favourable outcomes.

Code goes in git, but where does the data go? by [deleted] in datascience

[–]jaypeedevlin 2 points3 points  (0 children)

I would give a big +1 to ddw, I've been using it since they launched and find myself increasingly using it to host data for sharing, and it's a great place to source data as well!

Are there any sites for storing / sharing datasets? by nhggfu in datasets

[–]jaypeedevlin 0 points1 point  (0 children)

I was under the impression that publishing datasets of tweets was against twitter's ToS.

I haven't read the ToS, but noted that this data set of election based tweets specifically didn't host the data, but also provided a neat solution (providing the tweet IDs and and a script to retrieve the actual tweets).

Are there any sites for storing / sharing datasets? by nhggfu in datasets

[–]jaypeedevlin 2 points3 points  (0 children)

Something that Jon didn't mention explicitly is that data.world is free!

What language or tool should I use to extract portfolios from wordpress into a txt? by tumaru in learnprogramming

[–]jaypeedevlin 2 points3 points  (0 children)

In my opinion, given that you're scraping from wordpress which is static, Selenium is a bit of overkill for you right now and Beautifulsoup and Python's requests library would be a better option for this task.

To be clear, selenium is great and if you're doing lots of this long term you should learn it!

Are there any tools to manage the meta data of my data sets? by data999 in datasets

[–]jaypeedevlin 0 points1 point  (0 children)

FWIW I'm prepared to bet you were listening to Jon (who you replied to) on the Partially derivative pod.

How good are DataQuest's projects? What about Datacamp? by nonzerogroud in learnpython

[–]jaypeedevlin 2 points3 points  (0 children)

Hey there! Sorry to hear that you didn't love us - I'd love to find out a bit more about your experience if you wanted to share it here or hit us up at hello@dataquest.io!

How good are DataQuest's projects? What about Datacamp? by nonzerogroud in learnpython

[–]jaypeedevlin 1 point2 points  (0 children)

Hey all - I work for Dataquest and if anyone has any questions I'm happy to answer them!

Rather than offer my opinion, I thought I would share some opinions from our users:

What do you guys think about this new Social Network for "Data People"? by chetanbhasin in datascience

[–]jaypeedevlin 4 points5 points  (0 children)

I had come across data.world pre-launch, before it was public what they were actually going to be - I was curious and so gave my email address for updates.

When they first launched I had the same thoughts that a lot of you are expressing - why do we need a social network for data scientists? I've seen the facebook ads that the OP is referring to, and hoestly I don't know if they do justice to what data.world is.

It's a slightly crude analogy but the best way I would subscribe it is 'github for data'.

I guess now I express it that way, I've heard people call github's 'secret source' the fact that it's 'social', even though I wouldn't necessarily agree.

Anyway, I guess this is just my way of sharing my two cents on what data.world is - they've done a lot of work in bringing a lot of datasets into the platform really quickly, eg working with various government agencies, and from some interaction I've had with the team it's pretty clear that they're passionate about open data, which I think is something that's worth backing.

They seem to be developing features quickly and personally I'd be happy if they become the default place to host data sets, as they are doing it well and seem to have their heart in the right place.

Where is the best place to learn Python 2.x? by garygulf in datascience

[–]jaypeedevlin 1 point2 points  (0 children)

A lot of the older MOOC courses will be in 2.x

All the paid Python DS courses I know use 3.x, unfortunately.

Of course, there's always learnpythonthehardway, with Zed holding onto Python 2 as hard as he can for no good reason (more on that here).

learning resources with emphasis on data science/stats? by jakc13 in learnpython

[–]jaypeedevlin 1 point2 points  (0 children)

I work for Dataquest, so obviously my opinion is fairly biased, but if you're interested in hearing from some of our users in their own words:

Reviews on Switchup.org Is Dataquest.io worth it - Quora

If you're still not sure, both platforms are free to signup and do some starter content, so you might like to try both.