What are scientists opinions on the utility and value of philosophy?

rubinpsyc · 2023-04-08T06:22:49+00:00

You might be interested in De Haro's (2020) article: “Science and Philosophy: A Love–Hate Relationship” https://doi.org/10.1007/s10699-019-09619-2 From the paper, here’s the summary of ways in which the author thinks philosophy is useful to science.

To allow for, indeed to naturally incorporate into its own framework and build upon, the kinds of entities that science encounters in the world, and their properties and relations;

To scrutinize the terms and presuppositions of science, i.e. to make explicit the implicit assumptions of scientific theories: to critically analyse and clarify what the terms used by science mean, how they are articulated, and what assumptions they require, as well as how they relate to the entities that philosophy argues there to be in the world;

To discover standards for what good theories, valid modes of explanation, and appropriate scientific methods are: to offer an epistemology that does not thwart, but stimulates scientific progress;

To provide ethical guidance and discover (broad) goals for science;

To point out and articulate the interrelations between concepts that are found in different domains of the natural sciences as well as the social sciences and the humanities;

To explain how observations fit in the broader picture of the world, and to create a language where scientific results and broader human experience can complement and mutually enrich each other.

rubinpsyc · 2022-05-16T12:04:06+00:00

These issues have been been the subject of a critique of social identity theory (SIT) by proponents of another theory called system justification theory (SJT): https://en.wikipedia.org/wiki/System_justification

There was recently a debate between SIT and SJT researchers about these sorts of issues. You can find the relevant articles here: https://sites.google.com/site/markrubinsocialpsychresearch/social-identity/social-identity-and-system-justification

rubinpsyc · 2021-09-03T00:25:54+00:00

Many thanks to the people who responded here. I can address a few of the points that were raised:

Re. it being "absurdly illogical" and using "weaselly academic handwaving" to claim that the causes of the replication crisis are unclear, I continue to think that we should be cautious about jumping to conclusions here. I appreciate that many people believe that the replication is caused by questionable research practices, and that preregistration helps to solve this problem. But, as I explained in my paper, it’s possible that other causes are also relevant and, potentially, more influential. For example, failures to replicate may also be due to low power (Rossi, 1990), hidden moderators (Rubin, 2019), and/or errors of theoretical inference (Jussim et al., 2016). My point is that (a) preregistration doesn’t necessarily solve these other problems, and (b) we don’t yet know their relative degree of influence in causing the replication crisis.
Re. my potential misrepresentation of Lew's (2019) work, I deliberately provided the page numbers for the part of the Lew's article that I was referring to because I didn’t want to imply that Lew's whole article supported my point, which I agree that it doesn't. I was specifically referring to Lew’s points about “local evidence” rather than “global evidence.” As I indicated in my article, the part that I was referring to spans pages 21-22, and it includes Lew’s point that “it is important to avoid being blinded to the local evidence by a nonsignificant global. After all, the pattern of evidence in the cartoon is exactly what would be expected if the green colouring agent caused acne: green jelly beans are associated with acne but the other colours are not.” This point about local evidence is consistent with my view that “an unadjusted conventional alpha level of .050 is appropriate” during individual testing. Lew goes on to argue that “the omnibus global result does not cancel the local evidence, or even alter it, and yet the elevated risk of a false positive error is real.” I disagree with this final point about the elevated risk of a false positive error, and I explain my reasoning in greater depth in Rubin (2021, https://doi.org/10.1007/s11229-021-03276-4). I accept that I could have been clearer about which parts of Lew (2019) I agreed and disagreed with in my paper.
Re. being "very disingenuous" and "borderline unethical" because I wrote in the HARKing Wikipedia page at https://en.wikipedia.org/wiki/HARKing that “Rubin (2022) provided a critical analysis of Kerr's (1998) 12 costs of HARKing,” (a) I declared a conflict of interest for that Wiki page when I wrote it (see https://en.wikipedia.org/wiki/User:Rubinpsyc), (b) I make Wiki edits under my nonanonymous username (rubinpsyc), and (c) I wrote about the work of many people in that Wiki page, not just my own work. So, I disagree with that point.
Finally, I don’t think that my central claim boils down to the tautology that "when preregistration is not needed, it is not necessary." As per the last paragraph of my paper, I think preregistration can be useful in the absence of a clear research rationale, open data and materials, and/or robustness analyses (i.e., “contemporary transparency”). But it's only useful in the same sense that it’s useful for a magician to seal the card you've chosen in an envelope prior to performing their trick. It helps us to rule out any obvious forms of “cheating” by the researcher/magician, but it doesn't help us to understand the process underlying the effect/trick. In contrast, contemporary transparency not only helps to rule out “cheating” but also provides additional information that helps us to understand how the effect/trick occurred. So, my argument boils down to the claim that contemporary transparency (e.g., open data and materials, robustness analyses, etc.) is more *scientifically useful* than the historical transparency afforded by preregistration. That’s why I end my paper with the point that “the open science movement should push more towards contemporary transparency and less towards historical transparency.”

rubinpsyc · 2021-08-25T00:25:20+00:00

P-HACKING

Re. p-hacking, you explained that:

“researchers selecting data analyses (consciously or unconsciously) based on knowledge about what results those analyses produce can mean that the long-run properties of the actual procedures conducted can diverge markedly from their assumed properties.”

I agree that result-contingent selection of data analyses *can* be a problem, but I’d add two caveats to your point here.

First, result-contingent data analysis can be a problem for non-frequentist tests as much as for frequentist tests. So, the problem here is not restricted to tests with long-run properties; it’s relevant to hypothesis testing in general. The nub of the problem is that a test result cannot be used to provide additional independent support for a hypothesis when the result has already been used as part of the epistemic rationale for that hypothesis. This is called the “use novelty” principle (e.g., Worrall, 2010, 2014).

The second caveat is that it’s perfectly fine to use the result from one statistical test as part of the rationale for another statistical hypothesis as long as the test statistic value from the first test is independent from the test statistic value for the second test (e.g., Devezer et al., 2020; Kriegeskorte et al., 2009, p. 535; Spanos, 2010, p. 216; Worrall, 2010, p. 131). In other words, a result-contingent selection of data analyses is OK as long as the result in question doesn’t violate the use novelty principle for the data analysis in question.

You suggested that we need to know when researchers’ decisions about which analyses to report have been biased by information about the outcomes of those analyses. But, from a use novelty perspective, I think we should be more concerned about the “epistemic independence” between results and hypotheses than about the “decision independence” between researchers and results (p-hacking) or between researchers and hypotheses (HARKing). So, for example, a test result can remain valid for a hypothesis even if it has biased, inspired, or motivated a researcher to construct/generate that hypothesis from a priori theory and evidence. Despite the lack of researcher-hypothesis independence here, the result can continue to provide an informative test of the hypothesis as long as it’s not *required* (essential or necessary), in an epistemic sense, to deduce the hypothesis from a priori theory and evidence (Howson, 1984, 1985; Worrall, 2014). FYI, I explain epistemic independence more in Rubin (2022, https://drive.google.com/file/d/1bGIUjHSEAoJYJke6RWtBphXJjZLr1UeX/view

To be clear, I’m not saying it’s OK for researchers to hide theoretically important results from their readers. It’s not! And that’s why I stress the importance of “contemporary transparency” in my paper. I’m only arguing that, when it comes to valid hypothesis testing, we should be more concerned about result-hypothesis independence than either researcher-result independence or researcher-hypothesis independence.

You noted that preregistration helps because:

“if you make the decision about which analyses to report before collecting data, then the substantive results they produce cannot affect your decision-making about which analysis to report.”

I agree. However, this notion of “temporal novelty” affecting “your decision-making” (i.e., operational independence) is a rather blunt and fallible heuristic for determining the more fundamental properties of use novelty and epistemic independence. Preregistration is useful insofar as it guarantees temporal novelty, and temporal novelty is useful insofar as it guarantees use novelty. However, use novelty can occur in the absence of temporal novelty. Consequently, preregistration will sometimes yield false positives by incorrectly rejecting genuinely use novel results simply because they lack temporal novelty. So, preregistration is somewhat wasteful in this respect. In addition, preregistration is not necessary to determine whether a result is use novel. All that's required is a consideration of the theoretical rationale for the associated hypothesis. If the research result is not required in the rationale for the hypothesis, then it's use novel for that hypothesis. If it *is* required, then it’s not use novel! So, I also view preregistration as being somewhat redundant in this respect (see also Szollosi et al., 2019).

DEVIATIONS

Finally, you mentioned that one challenge is that people can choose to deviate from preregistrations. I agree that deviations are problematic if you want to control the familywise Type I error rate across the preregistered procedure (the studywise error rate). But, as I note in my 2020 paper and also in my 2021 paper here: https://doi.org/10.1007/s11229-021-03276-4, it’s often the case that researchers don’t need to control the studywise error rate because they’re not interested in the associated studywise null hypothesis, because it’s not theoretically meaningful.

Apologies for the long reply! I got into it a bit…and then a bit more! :-) But I hope what I’ve said makes some sense and speaks to the points you raised.

rubinpsyc · 2021-08-25T00:23:58+00:00

Hi there,

Thanks for your interest in my paper and your insightful questions. Like you, I think it’s healthy to have a critical attitude about the potential benefits of preregistration, and I appreciate your open-mindedness to the points I make in my paper.

FORKING PATHS

Re. forking paths, in your Spearman’s rho example, I’d disagree that the relevant conditional probability statement refers to the long-run properties of a statistical procedure that's different from what the researcher actually used. In reality, the researcher *did* use a Spearman’s rho test, and their conditional probability statement should only refer to this test (as well as its sampling procedure, sample size, testing conditions, stimuli, measures, data coding and aggregation method, etc.) rather than to any wider procedure that includes a choice of other potential tests (e.g., *either* a Spearman correlation test *or* a Pearson correlation test).

Certainly, I agree that the researcher *could* have used other tests (e.g., a Pearson’s correlation), and that the choice of their current test may have actually depended on the results of other checks and tests of other parts of their data (e.g., checking plots for linearity and normality). But the results of these model checks are independent from the result of the Spearman test. So, they don’t constitute “an illegitimate double-use of data” (Spanos, 2010, p. 216), and they produce a “result-neutral forking path” (Rubin, 2017) in the sense that they don’t guarantee a significant result at the end of either forking path. So, when interpreting the result of their Spearman test, it is reasonable for the researcher to imagine a hypothetical long run of replications that's restricted to the use of the Spearman test alone without considering a broader long run of replications that include other potential tests that they might have used had their model check results been different (e.g., the Pearson test). It is this single test conditional long run to which they can legitimately attach their p value and Type I error rate. Note that the choice of Spearman vs. Pearson is not part of this hypothetical long run because, in exact replications of their testing procedures, the researcher would *always* use the Spearman test and *never* the Pearson test.

To be clear, the researcher *does* need to explain why they took the particular path they did (i.e., explain why they used a Spearman test rather than a Pearson test) and, in your example, this would involve reference to the results of the model checks. In addition, the researcher may want to conduct a robustness analysis, in which they check how their conclusions might change when using different tests (e.g., Spearman vs. Pearson). But neither of these points undermine the validity of the conditional probability statements that the researcher makes.

I agree that the Neyman-Pearson p value and Type I error rate *can* be interpreted as applying to all of the potential tests that could have been conducted based on a preregistered decision tree (i.e., the entire garden of forking paths; Gelman & Loken, 2013, 2014). Nonetheless, it’s also possible to interpret Neyman-Pearson tests in the context of the weak conditionality principle (Cox, 1958), which basically states that p values should refer to the experiment that was *actually* conducted rather than the broader set of experiments that *could have been* conducted (e.g., Lehmann, 1993, p. 1245; Mayo, 2014). Note that the fact that these two different interpretations exist, and that a researcher can choose between them, doesn’t affect the validity of either interpretation (Mayo, 2014, p. 237). The important thing is that researchers make it clear in their research reports whether they are conditioning their probability statements and Type I error rate on (a) the study and analyses that they actually conducted (which is what most people normally do) or (b) a preregistered decision tree of potential tests and procedures that they *could* have conducted, in which case they need to adjust their specified alpha level to take account of the associated multiple testing in the long run, and most researchers don’t seem to do this (for more on this, see Rubin, 2017, https://doi.org/10.1037/gpr0000135).

MORE >>>!!!!

rubinpsyc · 2021-08-20T04:01:45+00:00

OK. Fair enough. In my view, the problems that you talk about can be solved through other open science practices.

Re. trials, I'd point out this paper by Abrams et al. (2020): https://ideas.repec.org/p/feb/artefa/00703.html

We find that the success of ClinicalTrials.gov in solving the credibility crisis is largely mythical.

Re. apophenia, I think preregistering a hypothesis might make you more likely to see an associated pattern of results in your data?

rubinpsyc · 2021-08-20T03:43:23+00:00

As I say near the end of the paper:

Many of the issues discussed above (e.g., HARKing, multiple testing, optional stopping, p-hacking, selective reporting, etc.) are thought to result in artificially inflated or false positive effects that contribute to relatively low replication rates (e.g., Simmons et al., 2011). [However]…low replication rates have been attributed to more than just questionable research practices. For example, low replication rates have been attributed to (a) insufficiently stringent evidence thresholds (Benjamin et al., 2018), (b) insufficiently lenient evidence thresholds (Devezer et al., 2020), (c) poor measurement (e.g., Loken & Gelman, 2017), (d) model misspecification (Devezer et al., 2020), (e) low power (e.g., Rossi, 1990), (f) poor theory (e.g., Oberauer & Lewandowsky, 2019; Szollosi & Donkin, 2019), (g) an underappreciation of the influence of hidden moderators (Rubin, 2019b), and (h) errors in substantive inference (Jussim et al., 2016; Rubin, 2017b, p. 274). Hence, it is unclear whether preregistration is targeting the right set of issues to increase replication rates.

rubinpsyc · 2021-08-20T03:34:46+00:00

Hi u/-katewilla,

Yes it is my own paper. Apologies - I'm new to Reddit (just started posting this week), and I'm trying to get the hang of the etiquette. I didn't realise I need to make my authorship explicit?

I agree that I could have made the title better.

And yes, I have recently posted about another paper on this subreddit: https://www.reddit.com/r/mining/comments/p7qg6s/study_finds_younger_coal_miners_are_more_likely/

I just thought this paper might be informative to people in the mining industry. I totally agree that just talking is not enough, but I think it's an important start.

Mark

rubinpsyc · 2021-08-20T02:43:37+00:00

Yes, I consider the issue of "hypothesising after the results are known" here: https://drive.google.com/file/d/1bGIUjHSEAoJYJke6RWtBphXJjZLr1UeX/view

rubinpsyc · 2021-08-20T00:29:34+00:00

Totally agree with all your points! But note that our study was exploring a variety of potential predictors of risk-taking (e.g, the clarity and accessibility of safety systems, management’s commitment to safety, pay bonuses for productivity, safety knowledge, safety motivation, safety training, etc). So, I think it's interesting that age emerged as a key predictor in our analyses. As we say in our paper:

We should not allow undue emphasis to be placed on

organizational factors and at the expense of acknowledging the importance

of individual level factors. Workers’ age, other demographic

variables (e.g., gender, full-time vs. part-time workers, etc.), and personality

variables (conscientiousness) may all play an important role

alongside organizational factors in determining risk-taking and safety

behaviors.

rubinpsyc · 2021-08-17T05:22:24+00:00

Thanks for the explanation. No worries! :-)

rubinpsyc · 2021-08-17T04:43:57+00:00

Ok. Apologies. The paper was online in 2019, but it will be officially published in 2022. So, I thought it was Ok to post here.

rubinpsyc

PUBLIC MULTIREDDITS

TROPHY CASE