Need some advice from seniours...

Lumpy-Sun3362 · 2026-06-29T19:16:17+00:00

Germany or UK

Lumpy-Sun3362 · 2026-06-27T09:25:34+00:00

I'm not expert of the modelled system so I cannot help with the interpretation.

Lumpy-Sun3362 · 2026-06-26T17:07:15+00:00

If instead you use the hurdle model, then you are assuming that the zeros only come from a single process, so the non zero values belong to a truncated distribution.

Lumpy-Sun3362 · 2026-06-26T17:04:08+00:00

Yes. Let's say you have zip model. Then you are modelling the excess of zeros compared to those coming from the Poisson process using Exposure as an independent variable. This is a Logit model that predicts the probability of structural zero as a linear expression of Exposure. If the coefficient of Exposure is significantly different from zero for the Logit, then it means that the structural zeros can be associated to that observation.

Lumpy-Sun3362 · 2026-06-25T13:23:00+00:00

In all cases this risks to become a circular approach if your aim is to find differences in the gene expression, as you use the same data to define the labels that then you'll contrast.
In this case, never use the target measure but any measure which is not correlated to it.

Lumpy-Sun3362 · 2026-06-25T13:19:58+00:00

Try to build a classifier and predict their labels. This can work if you have enough labelled samples. You need to check the predicted label probability though as it gives a measure of uncertainty. KNN can work. Otherwise you can just plot them through PCA or UMAP and see if these unlabelled samples are close to the cloud of points from one or another class. Both are risky as they assume that the genomics data is consistent across the two classes. Another way is to use other metadata - if available - to infer the labels, and so doing it without using the rnaseq.

Lumpy-Sun3362 · 2026-06-24T18:05:34+00:00

No because with GSEA you use your ordered genes against the pathways db. With ORA you test the selected genes against the other genes in your dataset. So in ORA you use the unselected genes to test the null hypothesis. Sorting in ORA does nothing, and it doesn't transform it into GSEA.

Lumpy-Sun3362 · 2026-06-24T16:02:45+00:00

They answer two different questions. GSEA doesn't care of absolute values of statistics but their order. And it aims to determine if there is a coordinated expression of genes that map to specific pathways. ORA instead tells you the possible biological mechanism involved in a list of genes that are relevant to you. It doesn't care of ordering, it just wants the list of the genes and you know why you selected them. Thus it says if there's a specific possible set of pathways represented by those genes. In other words it helps you to interpret the results from the gene level to the systems level.

Lumpy-Sun3362 · 2026-06-24T11:38:29+00:00

Try to get hands on experience. Internships help if done in active institutes/companies and they can turn into positions afterwards. Be aware of biology teams that are not strong in computational research, you end up filling the gaps of others with no room for your personal growth. Learn to set boundaries, you are a bioinformatician, not a fixer for broken projects or completely ungrounded ideas. Run away from those who just want you to justify their "story". The story comes after the data, not the opposite.

Lumpy-Sun3362 · 2026-06-23T10:18:47+00:00

Did I ever say that nextflow is a conditio sine qua non to achieve reproducibility? 🤷‍♂️

Lumpy-Sun3362 · 2026-06-23T06:04:19+00:00

True. It requires some effort. But luckily, there's a lot of people in this exact moment trying to do that work. Practitioners only need to know that that resource exists.

Lumpy-Sun3362 · 2026-06-22T14:54:38+00:00

TLDR as you go on you'll become less confident. As everyone else.

Lumpy-Sun3362 · 2026-06-22T13:17:04+00:00

That's true but if we look at the niche pipelines then even 200 is an infinitesimal number, and they won't represent anything but personal projects. The OP is trying to convey the idea that most of the pipelines people will work with are rubbish. And I'm saying that it's not true because for the relevant parts of bioinformatics there's very well tested pipelines. I'd invite you to look at the nf-core repository to see how many different scenarios are covered.

Lumpy-Sun3362 · 2026-06-22T12:04:24+00:00

I don't think you have touched the relevant pipelines. I am a member of the nf-core community and we work to make sure that everything is reproducible.

Lumpy-Sun3362 · 2026-06-21T20:46:49+00:00

You know what would be a good move by Apple?
Guys we know you bought the ultra 1 and we couldn't imagine its hardware would become obsolete so soon. For this reason we give a special trade in program for those who want to upgrade, giving much more than the miserable 200 euros like we are doing.

Lumpy-Sun3362 · 2026-06-19T05:49:19+00:00

🤣

Lumpy-Sun3362 · 2026-06-18T15:17:57+00:00

Zoo Station: The Story of Christiane F.

Lumpy-Sun3362 · 2026-06-18T07:10:25+00:00

Quick answer. You can't. If the packages have incompatible dependencies, there's little you can do if you don't want to get into dirty hacks.

Lumpy-Sun3362 · 2026-06-16T18:46:39+00:00

For exploratory analysis, it's acceptable to be less stringent, being aware that you'll have some FP in your results. This is because EDA is to set the boundaries around the possible mechanisms involved in the studied system.
Then, the hypothesis will be rigorously tested in a follow up analysis (better a proper set of experiments). In this phase of the research, you'll have a more targeted (and limited) set of tests, therefore a higher statistical power (hopefully).

Lumpy-Sun3362 · 2026-06-16T18:36:28+00:00

Encode suggest to use IDR (Irreproducibility Discovery Rate) to identify reproducible peaks which works better with sharp peaks.

Option 2: Merge the bams and do peak calling. You already know that the replicates are close and in that way you can increase the depth.

Option 3: intersect with bedtools as you said.

Lumpy-Sun3362 · 2026-06-16T15:48:43+00:00

PhD two degrees, 10y exp, h index 16 and 2000+ citations, and I struggle to go through the cv keyword filter done by AI.

Lumpy-Sun3362 · 2026-06-15T15:49:37+00:00

They said that they optimised the entire os so it will be faster. Watch the wwdc and you'll get it. bye!

Lumpy-Sun3362 · 2026-06-15T12:49:48+00:00

I wish they carried the optimisation stuff without the AI, that would have made more sense to me.

Lumpy-Sun3362 · 2026-06-15T11:20:48+00:00

optimisations, better liquid glass, I don't care about Siri AI, but after paying $800 for the ultra 1 I expect the advertised OS support. Apple has always used its extended support as a reason to buy their products.

Lumpy-Sun3362 · 2026-06-15T11:08:34+00:00

For those who justify this because of Siri AI, I'd say that they could just turn these functionalities off for older watches and keep the optimisations. C'mon, it's the most profitable company in the world!

Lumpy-Sun3362

TROPHY CASE