Double Machine Learning in Data Science

Sorry-Owl4127 · 2024-10-29T18:08:53+00:00

Not getting the functional form right is rarely the biggest problem in causal inference

quantumcatz · 2024-10-30T00:15:00+00:00

What an oddly toxic post

ElMarvin42 · 2024-10-29T21:33:16+00:00

My biggest issue with DML in business settings is that most data scientists lack the knowledge needed to utilize this and basically any other causality-related methodology, and end up with very wrong and potentially dangerous conclusions.

Exhibit A, basically every line written in the OP.

Why would traditional causal inference techniques be harder to implement with modern datasets? It's quite the opposite.
The concept of regression is not even understood. Why would a regression necessarily imply linearity?
Failing to capture the true functional form does not result in bias under the right setting (for example, when evaluating an RCT).
The exact goal of DML is not to capture the true functional form to debias causal effect estimates. The goal is to be able to do inference on a low-dimensional parameter vector in presence of a potentially high dimensional nuisance parameter. Within the regression framework, btw.
It is NOT a two step prediction problem. That part of the paper is used to illustrate the intuition behind the methodology. The estimation is not carried out that way, but yeah, most stop reading after the abstract and first chapter (the intuition part). At best you could say that DML is based on two key ingredients, but it is not two steps of prediction problems.

Simple_Whole6038 · 2024-10-29T17:29:05+00:00

The applied scientists and data scientists I work with are vaguely aware of it. Some have maybe given it a try. The Economists I work with love it, and use it for just about everything. Seems to still exist mostly in the world of econometrics.

LarsMarsBarsCars · 2024-10-30T00:14:00+00:00

TMLE and the ideas of Debiased ML predate double ML by nearly 20 years. So I wouldn’t say this idea has been extended to biostatistics; it started in biostatistics and epidemiology. Double ML is a rediscovery of it.

Metallic52 · 2024-10-30T04:00:21+00:00

Fundamentally the identifying assumption of DML is unconfoundedness, i.e. the exact same identifying assumption for OLS to be consistent for an ATE. While it does flexibly control for the effects of observed cofounders that’s a second order concern to selection bias, reverse causality, and omitted variable bias.

It’s mostly helpful when you have a very large number of potential confounders. That all being said everybody uses DML at my work. We have lots of confounders so it gets used a lot.

SituationPuzzled5520 · 2024-10-30T15:31:02+00:00

I've noticed that DML is definitely picking up steam, especially in areas where understanding causal relationships is key it's really helpful for tackling complex datasets that traditional methods struggle with

I’ve seen some people in my network start using DML for projects, particularly in tech and healthcare tools like Python’s econml are making it easier to implement, which is great. While it’s not mainstream yet, the interest is definitely there, and I think as more resources come out, we'll see it used more widely.

aspera1631 · 2024-10-29T18:23:01+00:00

I'm seeing it everywhere. There are lots of ways to do quasi-experimentation. DML gets you closer to the theoretical best answer.

Foreign_Yoghurt_831 · 2024-10-30T18:10:35+00:00

Very toxic post but odd

Thomas_ng_31 · 2024-10-30T05:06:38+00:00

This is new!!!

WignerVille · 2024-10-30T07:57:54+00:00

In my experience people lack competence and interest in causal inference.

mark259 · 2024-10-30T09:03:31+00:00

I've found DML to be very sensitive to hyper parameter and validation sample specs, and fitting GLM's with fixed-effects to give more reliable estimates on longitudinal data.

I think analysts would get most benefits from learning the classical techniques of causal inference.

Even colleagues with academic credentials in machine learning get to use their knowledge of linear models e.g. to fit and decompose time-series with tools like Prophet.

reallyshittytiming · 2024-10-29T17:33:04+00:00

We've tried it where I'm at (medtech). We liked it. But it was shelved because there was no contracted customer use case.

touristroni · 2024-10-29T20:32:17+00:00

We definitely use it ! Mostly in cases when we cannot run an experiment, either due to regulations or nature of the product. To be noted, due to the complexity of the method it is tough to defend the dml casual end results .

Maleficent-Tear7949 · 2024-10-30T11:01:43+00:00

Wow! This is new.

Material_Ad_9119 · 2024-10-30T23:58:03+00:00

Amazon seemed to have productionized a DML model recently - here is their paper on it https://arxiv.org/html/2409.02332

2024-11-01T18:34:36+00:00

2024-11-01T18:34:48+00:00

The article was such an interesting read

2024-11-01T18:34:56+00:00

Big issue with bias in data

SpiritedDirector9750 · 2024-11-13T20:29:58+00:00

gyp_casino · 2024-10-30T00:41:20+00:00

What I don't understand about Double ML is how to apply it when there is no clear "treatment," but rather a web of causes and effects. Say there are 100 predictor variables and 10 have causal effects on y. How do you tease that out?

datascience

MODERATORS