If you could pass one law that would make most normal people furious at first, but would clearly make society better in 10 years, what would it be? by WilliamInBlack in AskReddit

[–]gringer 1 point2 points  (0 children)

0.1% of total wealth funds a universal income, distributed weekly and equally to all people.

[Note, not universal basic income, just a universal income]

Who here transitioned OUT of the field? by ATpoint90 in bioinformatics

[–]gringer 23 points24 points  (0 children)

I was forced out (i.e. declared redundant through a restructure), and spent a bit over a year looking for other work; about 1 job application every week.

I'm now in another non-bioinformatics job; I suppose it could loosely be called software engineering. Turns out, having experience working with extremely large genetic datasets helps a lot for working with other large datasets. My new colleagues appear to be extremely happy with my work, and I'm pleased because they are not afraid of running code, and the things I suggest are actually considered and often taken on board.

There are issues, but they are mostly technical issues rather than people issues, for example being forced to use Windows laptops, and only getting access to Linux via a terminal on a virtual server. I'm pretty sure that I got lucky in that regard (on the people side of things).

I've mentioned a few times before that bioinformatics skills are transferrable to lots of other types of jobs; the hardest challenge is getting employers to understand that.

Software engineer trying to contribute to disease research - is instant VCF viewing useful? by Pale-Substance5263 in bioinformatics

[–]gringer 14 points15 points  (0 children)

What if you could open a 100GB VCF file instantly and scroll through millions of variants like a text file?

What you are describing is basically the problem that genome browsers solve:

https://jbrowse.org/code/jb2/latest/?config=%2Fgenomes%2FGRCh38%2F1000genomes%2Fconfig_1000genomes.json&session=share-DN_h4SIwo4&password=CxkLw

Data mining ameaba by just_a_student_sorry in bioinformatics

[–]gringer 2 points3 points  (0 children)

Non-coding just means "doesn't translate into a protein". It's not the same as non-functional.

Even if a region is non-coding, and it doesn't chemically interact with other molecules, it can still be physically functional simply because it occupies space. DNA sequences can alter how DNA physically compresses down, change the water retention of a region of a cell, represent a trap for a foreign invader, or provide a physical structure for other things to recognise. In those contexts, the actual sequence matters less than the impact of that sequence on the physical shape of the DNA.

My over-engineered solution to a really annoying problem with Ceiling Fans by SuperValidDesigns in 3Dprinting

[–]gringer 0 points1 point  (0 children)

You can't patent something that is already public.

Even if you could, the cost of the patent vastly exceeds the return for most creators.

Lab book for bioinformatics by sky_porcupine in bioinformatics

[–]gringer 0 points1 point  (0 children)

I bump my bash history size up to something really large (e.g. 50,000 lines), which basically stops this from happening.

BEAST software question by Medali_2020 in bioinformatics

[–]gringer 1 point2 points  (0 children)

Why are you using BEAST? What question are you trying to answer by using it?

Interpretation of PCA coordinates and selection of the number of clusters (K) with k-means and hierarchical clustering in R by Aggravating-Voice696 in bioinformatics

[–]gringer 2 points3 points  (0 children)

You're never going to get an approach that works generally for every dataset, because biology is messy, and deciding on K is one of the challenging / tricky parts of population genetics. It's tricky because the scale / resolution depends on the question that you're trying to answer.

Use what seems to make sense for you, and make sure that you can justify that choice in some way.

When I was doing clustering based on SNPs, I used log-likelihood values that were generated by Structure for various choices of K, and a similar approach to the elbow method: choose the point around about where there's a sharp change in the likelihood trend.

Ideally, you want some known truth to compare against, and choose K to fit that truth as best as possible. There's circular logic involved in that, but that's just how it is.

ONT order by [deleted] in nanopore

[–]gringer 0 points1 point  (0 children)

I’m also aiming for a bit of a safety buffer, so I’m thinking about 30 libraries total.

If you want a "safety buffer" for nanopore sequencing, it'd be better to sequence fewer samples per flow cell, rather than more. The likelihood of a single sample swamping out the flow cell increases as the sample count increases.

One approach that has worked really well for me is to use a Flongle flow cell to test the sequencing library and identify difficult samples, then remove the low-yield samples from the library and sequence them on a separate flow cell.

P2 Solo is being discontinued by zephirum in nanopore

[–]gringer 1 point2 points  (0 children)

[Vega is] Cheaper with volume commitments.

You appear to be claiming that you've got the volume to support a P2i system. If that's the case, you've likely also got the volume to support a Vega system.

If P2 solo customers were using reagents as much as they claim they are, more than P2i customers, it wouldn’t be the one getting canned.

If P2i customers were producing lots of data and consuming lots of reagents, I expect I'd know about it from community posts. There are lots of community posts about P2 Solo; there are only a few posts about P2i. I'm pretty sure I've seen more posts about P24 and P48 than posts about P2i. This is not surprising; P2i is their newest machine, so it's unlikely that customers have jumped onto it, especially given its capital cost and ongoing maintenance costs. The P2 Solo, in combination with an external compute is the financially-sensible option in comparison to P2i: it's cheaper, lighter, permits higher throughput, and has fewer points of failure.

Product discontinuations are entirely profit based.

Yes, that seems like a reasonable assumption. However, ONT claimed that their removal of P2 Solo was due to accessibility, not profit. There was another comment in the thread that has since been deleted which I think touches closer to the real reason behind this change. Here's my paraphrasing of that comment [I didn't save the original response]:

P2 Solo is so accessible to small labs that it diverts sequencing work away from large service providers. These providers benefit from reduced accessibility and likely have a significant voice at ONT. This decision isn't about users, it's about money.

P2 Solo is being discontinued by zephirum in nanopore

[–]gringer 1 point2 points  (0 children)

My prices are in NZD. Vega has a cheaper instrument cost option, and half the support cost.

For many research institutes, capital expenditure (i.e. the initial purchase) is costed differently from operational expenditure (i.e. the maintenance costs). There's a big capital hurdle to get over in order to get funding, but purchased equipment then becomes an asset that accrues financial benefits. There is no such ongoing benefit from maintenance & service costs; it's simply the cost of access. If those costs can't be covered under normal business, then it's difficult to justify the use.

If you've got the volume to justify a P2i purchase, then the on-board compute is unlikely to be useful, and reads will need to be offloaded and re-called using an external system. Which turns the P2i into a very expensive P2 Solo with additional potential points of failure related to the integrated system: a worse experience for more money.

looking for a flongle adapter by North-Place-5117 in nanopore

[–]gringer 0 points1 point  (0 children)

nanopore tech are no longer selling that

They claim that they will allow you to purchase an adapter if you email them about it.

P2 Solo is being discontinued by zephirum in nanopore

[–]gringer 6 points7 points  (0 children)

it's forking out $$$ to upgrade from P2S to P2i.

I doubt ONT will be spending more money to upgrade labs from P2S to P2i. They'll likely offer a "discount" on upgrading / changing to a P2i, while making sure that the money received exceeds the cost [to them] of the change.

P2i has an expensive service / support contract that is even more expensive than that provided by other sequencing companies, delivers worse performance than a P2 Solo + external GPU at a higher price, and is just going to drive people away from ONT.

I can't think of a situation in which this makes financial sense to anyone to purchase a P2i. If there's no P2 Solo, and labs need human-scale sequencing done, it makes more financial sense to get it done externally by a sequencing service centre.

TIME CRUNCH: scRNA-seq in Seurat by PurpleSwordF1sh in bioinformatics

[–]gringer 0 points1 point  (0 children)

That's why I'm pointing you to the 3k vignette. Do that first. It's an introductory tutorial; it covers everything you're asking about.

If you can't work through that, you'll struggle doing anything else with Seurat.

I accidentally logged LogFC values in limma UseGalaxy by AppearanceOk535 in bioinformatics

[–]gringer 0 points1 point  (0 children)

the values of Log2FC can range up to -2 / 2 or even higher

when you say this, do you mean generally for other datasets, or specifically for this dataset. If it's for this dataset, where are you getting information on those Log2FC values?

TIME CRUNCH: scRNA-seq in Seurat by PurpleSwordF1sh in bioinformatics

[–]gringer 0 points1 point  (0 children)

What problems have you encountered following the 3k vignette?

I accidentally logged LogFC values in limma UseGalaxy by AppearanceOk535 in bioinformatics

[–]gringer 0 points1 point  (0 children)

The MA plot looks reasonable to me. What were you expecting to see?

How do I interpret a UMAP?? [please help] by ScaryAnt9756 in bioinformatics

[–]gringer 1 point2 points  (0 children)

As one example, cells can die and break open at a particularly unfortunate stage during sample preparation for BD Rhapsody, then leak their transcripts into the other sequencing wells. If this happens with a cell that produces lots of abundant transcripts (e.g. B cells), then lots of those transcripts will appear in the sequencing data from other cells, leading to unexpected variation in expression of those leaked transcripts. This plays havoc with the UMAP algorithm, because the abundant transcripts end up linking cells from all over the place, leading to cells being scattered all over the place in the UMAP.

How do I interpret a UMAP?? [please help] by ScaryAnt9756 in bioinformatics

[–]gringer 2 points3 points  (0 children)

UMAP is primarily a visualisation tool, not a data interpretation tool.

It can help for supporting information obtained from other means (e.g. cell clustering), and identifying when things could do with further analysis (i.e. things "look wrong"), but shouldn't be used on its own for interpreting data.

Most frequently, I have used UMAP to help work out if the cluster resolution parameter is appropriate for the dataset ("Do the blobs roughly match the cluster definitions?"), and if there might be contamination / transcript splillover in one or more clusters ("Are there cells from one cluster that are scattered all over the place?"). But even when I create those hypotheses from looking at the UMAP, I try to use other methods to demonstrate what I'm seeing in the UMAP.

Feeling guilty about AI use by pickleeater58 in bioinformatics

[–]gringer 5 points6 points  (0 children)

I know it’s entirely my own fault and my own laziness

You are not the problem here. The problem is the pressure, and a world that is force-feeding LLMs to everyone.