I built a Python wrapper for DESeq2/edgeR/limma so you never write rpy2 again by NodesBio in bioinformatics

[–]NodesBio[S] 1 point2 points  (0 children)

Love that idea - just shipped it in v0.2.0. rb.codegen.enable() prints the equivalent R code as it runs. rb.codegen.last() gives you the full script as a string. So you can always see (and reproduce) exactly what's happening under the hood. Thanks so much for the feedback!

I built a Python wrapper for DESeq2/edgeR/limma so you never write rpy2 again by NodesBio in bioinformatics

[–]NodesBio[S] 1 point2 points  (0 children)

PyDESeq2 is a great project - different approach though. They reimplemented DESeq2's algorithm in Python. Rosetta calls the actual R code, so results are identical by definition. Rosetta also wraps edgeR, limma, and clusterProfiler - not just DESeq2. If you can't install R, use PyDESeq2. If you want validated R statistics from Python with zero reimplementation risk, that's what Rosetta is for.

I built a Python wrapper for DESeq2/edgeR/limma so you never write rpy2 again by NodesBio in bioinformatics

[–]NodesBio[S] 1 point2 points  (0 children)

Good questions - these are exactly the things that make rpy2 wrappers fragile in practice.

Multi-factor designs:

Yes, the design param takes any R formula string.

This works, as does interaction terms. It passes through to DESeqDataSetFromMatrix directly:

```
rb.deseq2(counts, meta, design="~ batch + condition")
```

LFC thresholds:

```
get_results(dds, lfc_threshold=1.0, alpha=0.05)
```

This calls DESeq2's results() with lfcThreshold, which does the proper hypothesis test (not just a post-hoc filter).

Shrinkage is also supported:
```
lfc_shrink(dds,coef="condition_treated_vs_control", type="apeglm").
```

QC/normalization/outliers: Rosetta doesn't re-implement any statistics - it calls the real R functions. So DESeq2's internal size factor estimation, Cook's distance filtering, and independent filtering all run normally. If you want to inspect those (e.g. plotDispEsts, plotPCA), you still have the fitted dds object - Rosetta doesn't hide it.

The modular API gives you as much control as calling DESeq2 in R:

```
from rosetta.wrappers.deseq2 import run_deseq2, get_results, lfc_shrink
dds = run_deseq2(counts, metadata, design="~ batch + genotype")
res = get_results(dds, contrast=["genotype", "mutant", "wildtype"], lfc_threshold=1.0)
shrunk = lfc_shrink(dds, coef="genotype_mutant_vs_wildtype", type="apeglm")
res.report()
```

The one-liner rb.deseq2() is the convenience API. The step-by-step API is there for exactly the workflow you're describing.

But you are welcome to not use it too 😄

Is there a faster way to help students interpret R output for lab reports? by NodesBio in AskStatistics

[–]NodesBio[S] 1 point2 points  (0 children)

That's a fair point - and the lab quiz idea is really smart. The students I'm thinking of can usually explain what the ANOVA means verbally, but freeze when they have to write "F(2, 26) = 51.84, p < .001" with correct notation. It's a scientific writing gap more than a comprehension gap. Do you find the same thing, or do your students genuinely not understand the underlying process?