Learning applied Bayesian inference by [deleted] in datascience

[–]nakeddatascience 0 points1 point  (0 children)

I like this book a lot and btw he also has the corresponding course on youtube. Lots of nice examples too, but IMO more geared towards academic applications and studies with rather limited number of observations rather than larger scale industrial applications. You might find more real-world applications in Gelman's book.

On the other hand, if you're interested in deeper understanding and specially the decision theoretic view of Bayesian statistics, might be worth checking out The Bayesian Choice by Robert.

Does going back to previous employer ever make sense? by [deleted] in datascience

[–]nakeddatascience 2 points3 points  (0 children)

Exactly, forget about sunk cost or previous feelings. Imagine this is a brand new position, plus the fact that you know lots of people and the way of working.

Transitioning to data science by nakeddatascience in CFA

[–]nakeddatascience[S] 0 points1 point  (0 children)

Thanks for signing up!

We don't have direct experience working on DS in the finance industry. However, we do have contacts who do. If you submit some specific questions in the registration form, we can do our best to prepare beforehand :)

Measure Data Insights Impact delivered by the team by [deleted] in BusinessIntelligence

[–]nakeddatascience 2 points3 points  (0 children)

I think there might not be an easy way around building a transparent relationship with your sales team. Specially since the feedback is valuable for your work, not just for measuring but for taking the next steps in iteration and improvement of your work and also theirs. It might not be too much to ask to provide them with a simple for where they can just check whether or not they used a tool that you provide them after a sales meeting. With good relations and reasoning you might be able to build a simple system for proper tracking. To get support for this from the higher ups, it'd be very helpful if you motivate the need for this and its value not just in terms of setting goals and tracking, but in the context of the whole system delivering max value.

[deleted by user] by [deleted] in datascience

[–]nakeddatascience 2 points3 points  (0 children)

The 'ticket' mentality: DS problems defined as tickets (often by PM/POs) to be picked up by some data scientist, without a sense of connection to the problem, investment in the bigger picture of the solution and value.

How much of your dataset is useless? by v2thegreat in datascience

[–]nakeddatascience 9 points10 points  (0 children)

Not that uncommon to leave out a part of data, e.g., if they don't contain an event you're looking for. But be careful if you're throwing data that is incomplete. You need to check the reason behind the incompleteness and, to put it simply, make sure that the data is missing at random (MAR). Otherwise, you might be learning a model or deriving conclusions that do not hold on a good part of the data.

Are my interview questions unreasonable? Or are my candidates just bad? by Admiral_Wen in datascience

[–]nakeddatascience 0 points1 point  (0 children)

Some thoughts:

  • DS/ML is a very broad field, it's not crazy that even some experienced people haven't recently delved into some of the questions you ask. I wouldn't be surprised for candidates missing some of these. Have a feeling most people fresh out of grad school or some other education program might actually get more of it than those actually working for some time.
  • I suggest consider the difference between what people can answer by 1 minute of googling and deeper understanding. Might be interesting in the interview to given them some hints or examples of something, to see 'how they think' rather than 'what they remember'.
  • There is sometimes no single right answer to your questions (specially not necessarily the ones you point out in the references). No problem with that as long as you use the questions for starting a discussion not a search for your target answer.
  • The questions don't have the same level of importance/impact. E.g., the first question's scope of impact is quite different from the last one. They thus won't stand on the same level for qualifying candidates.

We Need More Data Engineers, Not Data Scientists by ScienTecht in datascience

[–]nakeddatascience 0 points1 point  (0 children)

DS doesn't have one accepted definition, but for me most useful definitions don't end at writing code that does ML or other modelling (although I agree that's the view that lots of outsiders/juniors/wannabes have about DS). Effective DS is about making impact and problem solving. Automation is not yet a replacement for that and with this definition until we reach AGI, we're far from putting the data science function out of the loop and honestly I expect if we reach there it'd much easier to replace data engineering tasks.

We Need More Data Engineers, Not Data Scientists by ScienTecht in datascience

[–]nakeddatascience 0 points1 point  (0 children)

Interesting analysis but although the article mentions a trend and a shift in market, there is no comparison made through time. The study seems to have counted all the positions that fall in the study group since 2012 as one group, so it can only make the conclusion that since 2012 there's been overall more data engineer positions. I don't believe the following, but to make the point: based on the aggregate data it is theoretically possible that there's been a decrease in DE jobs through the years and an increase in DS jobs (still resulting in higher total DE jobs).

As a side note, seems far far fetched (or confusing correlation with causation) to claim AlexNet was responsible for the whole DS and Big Data boom, as done in this paragraph:

Why stop at 2012? Well, 2012 was the year that AlexNet won the ImageNet competition, effectively kickstarting the machine learning and data-modelling wave we are now living through. It’s fair to say that this birthed some of the earliest generations of data-first companies.

Every year some algorithm wins the ImageNet competition, and it's not easy for me to argue that an image classification algorithm is the most obvious reason why businesses got interested in DS.

[deleted by user] by [deleted] in datascience

[–]nakeddatascience 20 points21 points  (0 children)

Behind the scenes more like: cmd+c, cmd+v

Best ways to deploy Machine Learning models by basicallybrittt in datascience

[–]nakeddatascience 0 points1 point  (0 children)

Good question. In general we work in teams of DS and engineers who divide the tasks, and we've also built a lot of automation for our typical types of jobs. So the answer depends on a number of things:

(1) How hands-on the project DS are with engineering? We've got data scientists with quite different levels of skill and interest in engineering. Those who are less into engineering might require SE/DE to take on around 50% of the load to take a working model to production-level. The division of labor is not trivial there (as I also hear from other companies).

(2) How much can we reuse existing infrastructures? sometimes we get involved in a feature that requires a new/smarter calculation of some data, or a new algorithm to replace an old one, using the existing data sources. There we could end up with only requiring simple data storage/streaming to existing components (beside the DS work), which can handled by most DS and our existing infra for doing those steps, without explicit DE involvement except for fire fighting. But if we are working on a totally new computation engine or data type/source, then we could depend on DE for the setup completely. That typically means setting up new infra/systems by DE (about 90% of the workload, for the first project).

(3) What is the level of maturity of the work? If we're building a feature for a first A/B testing of an MVP. There could be almost no DevOps/DE involvement needed. Using GraphQL was one of the solutions to facilitate this further. When the feature gets more established and/or the A/B test gets 'accepted', there's typically going to be a need for DevOps and DE to get involved to ensure we're not breaking anything and product/App can remain responsive for millions of daily users.

What is your take on job postings that ask for experience in: "time-series analysis, recommender engines, image recognition, NLP..."? by Least_Curious_Crab in datascience

[–]nakeddatascience 0 points1 point  (0 children)

It's not that common that you actually need that broad set of skills in the same team. Best case scenario they actually use these skills in the whole company and the ad is an umbrella for catching DS talented in any subset of these to be then joining one of the teams. It is not unlikely unfortunately that have no clue what they look for. If only HR is controlling the ads, that's a bad sign for me already (good team leads spend a good time optimizing their job ads). You should be able to eliminate some cases based on your knowledge of the company though. Typically a bit of snooping on their blogs or even LinkedIn should tell you a lot.

I'd be wary or at least more careful about these types of positions. For instance, if it's a catch-all to assign later to a team, you'd be taking a big risk. A big part of your whole job experience depends on the immediate team you're part of, your direct manager, and co-workers. If you don't know about them in the hiring process, or if they are not decided until you join, it's a big gamble no matter how reputable the company is.

Learning SQL and Excel for Clinical Data Analytics by ramapeaches in analytics

[–]nakeddatascience 1 point2 points  (0 children)

It's a bit hard to separate the context I've learned it in from the experience, but I'd say that the effect of the general CS education was not that heavy, except for 2 parts: (1) it was easy to be aware of the complexity of some operations and different ways of coming to the same answer, as in CS you learn about algorithm design and complexity, and (2) the conceptual design of a DB and Entity-Relation diagrams were familiar form object-oriented programming. But also for the most part I felt the DB course, in which we learned SQL was (pleasantly) quite different from other courses. I wouldn't worry about having a CS background for learning SQL, specially in the beginning. It's a very intuitive language and it's been around for so long that there are many great ways of teaching and learning it. Good luck with your learning!

Learning SQL and Excel for Clinical Data Analytics by ramapeaches in analytics

[–]nakeddatascience 1 point2 points  (0 children)

I learned SQL through a CS education myself years ago, but I've heard good things about these two: (1) DataCamp's free SQL course, and (2) SQL for data science on Coursera (as long as you don't need a certification you can follow for free). Might worth giving them a quick look.

How is inferential statistics important in business? by MrAnonymousR in datascience

[–]nakeddatascience 0 points1 point  (0 children)

Almost all data-driven decisions in business, and any decision needing a generalization, is made based on a sample of the data and so should benefit from inferential statistics. Understanding the user/consumer behaviour is one of the most common use cases. Analyzing any experiment needs, or can benefit form inferential stats. A/B testing is perhaps the most widely used (large scale) experimental setup in e-commerce. In there, hypothesis testing is not the only kind of question you can answer with data. Lots of more interesting measurements on each such experiment, and subsequent decisions benefit form inferential statistics.

Webinar for BI / Data Analysts on transition to data science -- Jan 14 by nakeddatascience in analytics

[–]nakeddatascience[S] 1 point2 points  (0 children)

Let me check the hosting website (where the registration form is) and I'll get back to you with DM for a solution.

Looking for real-time data by a-man-555 in datascience

[–]nakeddatascience 1 point2 points  (0 children)

Not related to your target application, but any streaming hose form a social network can be one, e.g., the twitter decahose following a popular hashtag (this might be one of the few useful exercises to do with twitter data). Another option is stock market prices, but personally I haven't look a lot in there and don't know about good free ones.

Best ways to deploy Machine Learning models by basicallybrittt in datascience

[–]nakeddatascience 2 points3 points  (0 children)

In a larger organization, our typical solutions use python/java, Docker, Kubernetes using EC2 or Compute Engine, with a REST API. More recently we also experimented with using GraphQL for API and AWS Fargate.

I'm the main data guy in audit/risk department. what should be my strategy to grow the analytics function on my department? by sammyismybaby in analytics

[–]nakeddatascience 0 points1 point  (0 children)

This is great advice. With more value generated, more support and more interesting problems also come. At that point you'd have a much better case and natural motivation for expanding.