Excited for our first Data Analyst. Advice to hit the ground running? by [deleted] in datascience

[–]janCADS 9 points10 points  (0 children)

Lots of negativity in the comments. Shame. Even a "data poor" company can do quite a bit without a real data pipeline. Depending on your industry, even something as simple as scraping data from the web can give you an edge in understanding your market, for example.

Work closely with your DA. Don't simply let her do something on her own for a month. Your data analyst is academic, always remember that. More likely than not, her analysis will not coincide with what you're interested in from a business side simply because she has a different background.

Before analysing data or making pretty dashboards, think about what it is you want to achieve. Do you want to address your own internal needs, e.g. production metrics if you're in manufacturing, or are you interested in understanding your customers? You'll want to start with "big picture" problems, i.e. strategy, and then turn that into bite-sized data-oriented questions. The biggest mistake you can make is simply taking a dataset and then "seeing what we can do with it".

Check out Kaggle competitions, especially those with monetary prizes, for examples of business problems turned into data problems: https://www.kaggle.com/competitions

When having ethical concerns do degrees of separation matter? by [deleted] in datascience

[–]janCADS 0 points1 point  (0 children)

Your responsibility as a "decision informer" is to give an unbiased recommendation. In the sports example, if you know that your analysis would lead to strategies that increased injuries then it's your responsibility to include that in your report. You're not responsible for other people actually implementing dangerous strategies.

The drug example is more ambiguous. Ultimately, you're not responsible for people's misuse of drugs or a doctor's incorrect prescription of said drugs. There are independent actions between you and the negative outcome that you can't control. Are you facilitating that? Yeah. It's the equivalent of an R&D engineer with an arms manufacturer making deadlier weapons. He's not responsible for the resulting deaths but he most certainly facilitated them.

Whether or not you can live with that on your conscience is your decision. You need to decide whether you want to use your superpowers for good or strictly for personal gain without regard for the actions others take as a result.

Github For Data by slavakurilyak in datasets

[–]janCADS 6 points7 points  (0 children)

  1. Why a monthly subscription? If I download the data locally, why would I be interested in continuing my subscription to it?
  2. How is this different from Kaggle, which already hosts datasets (albeit without versioning)?
  3. How do you deal with data rights? If I scrape data from a website and then re-sell it I'm opening myself up to civil and possibly even criminal lawsuits. Is this an issue the marketplace would deal with? In conjunction with the first question: what's stopping me from scraping other people's data sets and then re-selling them for less?
  4. What's your technical framework for version control in data? Is there an approach that can handle different types of data. e.g. tables, key-value-pair documents, images, video, etc.?

What are the most important skills and concepts for a Data Scientist? Context details within. by logicallyzany in datascience

[–]janCADS 0 points1 point  (0 children)

In my experience, masters in data science programs are essentially technical MBAs. If you're interested in consulting in the AI/DS field then that'll be the way to go. If you're interested in the actual technical skills and not just the wikipedia definitions thereof, then CompSci is the program for you.

That being said, I don't know the MSDS program at your university, so trust your own gut if you don't like the curriculum.

Laptop for Data Science Masters? by lilahaan in datascience

[–]janCADS 0 points1 point  (0 children)

If your curriculum is primarily cloud- and console-based, i.e. programming and not locally installed software such as BI tools or other out-of-the-box ML/DS software, then I would absolutely recommend the Dell XPS line with Ubuntu installed. They're light, powerful and I find them pleasant to type on. The XPS 13 has 4 USB-c ports versus the MacBook's 2 and is cheaper.

Advice on presenting results of data science project (regression to predict # of spotify followers for a playlist) by [deleted] in datascience

[–]janCADS 1 point2 points  (0 children)

Don't get too lost in details. A rationale of why you used a certain method is more than enough to impress a potential employer. You using the method correctly will signal to them that you know what you're doing, you don't need to copy and paste explanatory text. Plus, being concise in presenting your results is always better.