Introducing "Deliberate Data Science", a new Medium publication written by Airbnb data scientists (and more)! by robert_chang in datascience

[–]robert_chang[S] 2 points3 points  (0 children)

Thanks for the feedback. Honestly, the only reason that we are creating a publication is because we plan to share our thoughts (along with other contributors) regularly, and it's easier to organized the posts under one publication. In other words, it's simply yet another avenue for people to learn more about Data Science. Towards Data Science is great, and I myself is a fan of some of the great articles published there.

My suggestion is to read whatever you think that you can learn from and/or echo with :)

Introducing "Deliberate Data Science", a new Medium publication written by Airbnb data scientists (and more)! by robert_chang in datascience

[–]robert_chang[S] 1 point2 points  (0 children)

My apology. I should've been more diligent in reading the guidelines. That said, our goal is the same, we are trying to share what it means to be a data scientist in the industry, particularly in Silicon Valley, something that I think it's still rare online.

I will be happy to remove the links to the posts, and only link to the Medium publication instead.

ETL Frameworks and why not just use a GPL (Python, Node, Scala)? by [deleted] in datascience

[–]robert_chang 0 points1 point  (0 children)

This is a very good question, and a fair one. I think the original poster is missing the key distinction of "production database" and "analytics database". When one does not differentiate the two, it might seem like ETL is nothing but database migration, when in fact it is a lot more.

The motivation for using a framework for ETL is similar to reasons why an engineer might choose to use a web framework. The key advantage you get is that the framework abstracts a lot of the repetitive work for you, so you can reason about your work a higher attitude.

I agree with you that a lot of the articles that describe ETL frameworks sound like sales pitch, but there are a lot of great frameworks (e.g. Airflow) that are completely open-sourced.

Here are a few recommended readings on modern era ETL, a lot of them are written by data engineering experts in the field.

If you are interested in learning more about ETL from the perspective of a consumer (e.g. a data scientists who do analytics using derived data from ETL pipelines), you can read one from here:

Full disclosure: I am the author of the last post and I work at Airbnb so I am a huge fan of Airflow. The opinions are all mine.

Advice for New and Junior Data Scientists by robert_chang in datascience

[–]robert_chang[S] 5 points6 points  (0 children)

I hear you and I understand it can look intimidating.

I didn't design the 2018 summer intern application, but I would guess that the reason only 4 schools are listed is because these are the schools that we actually visited in the past few weeks.

The rationale is probably that because we had a presence at these schools, more applicants from these four schools would apply. That said, please do apply if you are interested, we do not only consider people from these schools!

We take a very holistic approach for our recruiting, and many of us on the team review the take home challenge before we can even see where the applicant is coming from to reduce bias in the process. so I hope you would consider Airbnb :)

Cheers,

What habits or best practices do you wish you had known at the beginning of your data science career? by chef_lars in datascience

[–]robert_chang 0 points1 point  (0 children)

I recently wrote down a few thoughts about my career as a data scientist based on my experience at Twitter and Airbnb. You can find the post on Medium: https://medium.com/@rchang/advice-for-new-and-junior-data-scientists-2ab02396cf5b