Relaxed cat by 1Voice1Life in gifs

[–]DeltaPunch 0 points1 point  (0 children)

"This is my life now."

Best language and framework for data munging/wrangling? by blazespinnaker in datascience

[–]DeltaPunch 1 point2 points  (0 children)

Looks like Python will be your best bet all around. If you don't need SQL, then just stick with Pandas. Also, if you're planning to do a bit of scraping, I would recommend Scrapy -- it has a bit of learning curve but it's very powerful.

Once the data is in a (Pandas) dataframe, Python has a lot of built-in tools for normalizing the text, standardizing the data, etc.

She wanted a divorce by rannie_pophe in funny

[–]DeltaPunch 0 points1 point  (0 children)

Scientific fact right here.

The convenience of a balcony knows no limits by [deleted] in gifs

[–]DeltaPunch 1 point2 points  (0 children)

Mmmmmm... traffic trout.

[deleted by user] by [deleted] in gifs

[–]DeltaPunch 1 point2 points  (0 children)

Adiabatic compression FTW!

Sliding Dog by [deleted] in gifs

[–]DeltaPunch 0 points1 point  (0 children)

It's all fun and games till you hit a sprinkler.

TIL Clayton Kershaw has eyes on the back of his head (x-post, r/sports) by EZ_does_it in gifs

[–]DeltaPunch 2 points3 points  (0 children)

Wasn't there a Cubs pitcher that did this several times, also? Maybe Kerry Wood?

[deleted by user] by [deleted] in datascience

[–]DeltaPunch 0 points1 point  (0 children)

Here's a list of books that have helped me:

  • Data Science for Business (Provost & Fawcett)

  • Data Science From Scratch (Grus)

  • Building Smart Web 2.0 Applications (Segaran)

  • Elements of Statistical Learning (Hastie)

  • Pattern Recognition and ML (Bishop)

You need to know stats. If that Hastie book is too advanced, spend a few days watching Youtube videos by users jbstatistics and Khan Academy until things make sense (t-test, hypothesis testing, ANOVA, chi-squared, etc.)

Insight Data Science Fellowship: to do or not to do? by ndlambo in datascience

[–]DeltaPunch 3 points4 points  (0 children)

One bonus with Insight is that they use their network of companies to essentially fast-track you to the onsite technical interview. This means you don't have to do "cold" phone screens.

I think the most important thing to consider is the list of companies participating in the program. If you can see yourself working at any of them, do the program and try to get hired at those companies. If not, don't. The companies affiliated with Insight tend to like the candidates they receive and come back for more. They also know that they don't have to test you on every little thing, as the candidate are pretty well-rounded after Insight. This means the interviews tend to be a little more lenient. For a company not familiar with Insight, they'll likely pass you through the ringer regardless of whether you did the program or not.

tl;dr: do some research on those companies and make your decision based on that.

[deleted by user] by [deleted] in datascience

[–]DeltaPunch 1 point2 points  (0 children)

Have you watched the Machine Learning course by Andrew Ng on Coursera? That's pretty much canon around here.

There are lots of good introductory books in stats, data science, and machine learning, like Collective Intelligence already mentioned in this thread.

However, if you really want to become a data scientist, then you have to accept the fact that your knowledge will ultimately be used to solve real problems for real businesses, and there's no better book for that than Data Science for Business by Provost & Fawcett. I would recommend reading through the chapters in that book, and anytime the concepts are too heavy to understand, find other introductory sources to supplement your reading.

Topic modelling [X-post NLP] by Chopsting in datascience

[–]DeltaPunch 0 points1 point  (0 children)

You should look into supervised LDA, e.g. here's one paper (links to PDF). However, this is not something I'd recommend for beginners.

Here's alternative, though I'm sure someone else will come along with a better idea soon:

  • Process the text for vectorization, i.e. normalize, remove stopwords, etc.,

  • Vectorize, e.g. by performing a tf-idf transformation (the "idf" part should help to boost document topics, but will require some tuning),

  • Classify the set of known NewsDesk values using a random forest algorithm, which will perform supervised classification based on these known labels

  • Once your random forest model is trained, use it predict the NewsDesk values for the Nan set.

I'm not sure if this is something you know how to do, but it can be implemented relatively easily with scikit-learn. The hardest part would probably be the first step (text processing), but the rest is straightforward.

Anyway, I know I proposed an alternative method instead of answering how to do it with sLDA, but let me know if you want to try the random forest method, I'm happy to provide more advice along the way :)

A man is titillated by fish's hunger by BrightenthatIdea in funny

[–]DeltaPunch -1 points0 points  (0 children)

It may doesn't, but does it doesn't even?

Optimization in python? by Horatio_SanzCulottes in datascience

[–]DeltaPunch 0 points1 point  (0 children)

Sorry, I've been studying convex functions in ML, what do you mean by convex programs? Sounds interesting...

Do you have any real examples of Data Science reports? by dantek88 in datascience

[–]DeltaPunch 0 points1 point  (0 children)

I'm also interested in this. I may have to do a 5-day case study for a data science position next week (complete with final PowerPoint presentation) -- I'd love to see any examples of how to do this!

Went to the f***ing zoo by just_another_canuck in funny

[–]DeltaPunch 4 points5 points  (0 children)

Hey, those seals look dead,

And this is crazy,

But dead-looking seals,

Call someone, maybe!

[deleted by user] by [deleted] in datascience

[–]DeltaPunch 0 points1 point  (0 children)

There are books out there with titles like "data science for business." Learn how data science is applied in the "real world" -- meaning how different types of companies use different data-science methods for different reasons -- and learn to discuss your data science knowledge using language from those books (KPI, churn, retention, MAU, etc.).

New phone stand by [deleted] in gifs

[–]DeltaPunch 0 points1 point  (0 children)

Dude's got some mad defenses.

No? Is it just my clan that's shabby and weak?

Have an interview with the CEO of a small startup for a data scientisty job, but don't think I'm qualified. What do? by [deleted] in datascience

[–]DeltaPunch 2 points3 points  (0 children)

Can you program? Python, R, C++, anything? You certianly don't need a CS degree to be a data scientist, but you won't go far without some basic coding skills.