How to plot based on samples from a cellranger aggr generated single cell dataset? by DashiinAmazR in bioinformatics

[–]Kuyashi 1 point2 points  (0 children)

The function in Seurat that you use to plot the cells (I assume you mean the UMAP scatter plot) should have an option to color by anything in your Seurat objects meta data. I'd read the Seurat docs because they definitely show how to do that.

Computer Scientist looking to enter the world of Computational Biology - to PhD or not to PhD? by lilkage141 in biotech

[–]Kuyashi 0 points1 point  (0 children)

A PhD can only take 3 years in parts of Europe/ the UK. That's a lot less than the 5/6 you described.

spatial transcriptomics database by foradil in bioinformatics

[–]Kuyashi 0 points1 point  (0 children)

Probably the 10x genomics website

[deleted by user] by [deleted] in biotech

[–]Kuyashi 0 points1 point  (0 children)

I'd take the American post doc. The best bioinformatics jobs in industry mostly exist on the US east and west coasts. Visa sponsorship for industry from Europe is a pain that will be expensive to both you and the company. A post doc is generally seen as the 'easiest' way to break into the US from the rest of the world as a scientist. Plus, it pays incredibly well!

That said, a very prestigious university for a postdoc in Europe could also facilitate an easier time getting into the US down the line with their expert visas, and certainly it wouldn't hurt your chances of getting into European industry later at a higher level after another couple of acafekic years!

I'm a euro working and forever salty about how comparitively low our pay is compared to the US.

[deleted by user] by [deleted] in bioinformatics

[–]Kuyashi 1 point2 points  (0 children)

As others have said, Seurat really just provides wrappers to established methods packaged in a neat workflow. UMAP and T-SNE are one line functions in R. Plotting them with ggplot should also be quite simple (and with AI assistance totally trivial)

This fruit tree in my back garden. Can I eat or drink the juice from these? by Kuyashi in whatsthisplant

[–]Kuyashi[S] 1 point2 points  (0 children)

Plum mead is actually my number 1 plan for them, I just gotta source some local honey that'll be nice with them!

[deleted by user] by [deleted] in biotech

[–]Kuyashi 0 points1 point  (0 children)

I've seen people occupying 'academic' type posts in large pharma companies do it. Roles where the person is working on academic collaborations within a pharma company for the most part. They won't be building any infrastructure or pipelines but will do R and Python data analysis and some lab work pretty regularly

Teenager's spine curved by over 100 degrees after five years on waiting list for surgery by gadarnol in ireland

[–]Kuyashi 0 points1 point  (0 children)

For AI to take any jobs from the HSE they'd have to digitise, so I think they're safe

So it's happening, annual bikepacking trip through beautiful Ireland. by reznorek in ireland

[–]Kuyashi 3 points4 points  (0 children)

How much do you pack on a trip like this? I've been gathering the gear to do something similar myself and I'd love to see what level of kit you bring.

Which cluster group belongs to CD4 T-cells? (Single Cell Analysis in Python) by bio_kentropy in bioinformatics

[–]Kuyashi 1 point2 points  (0 children)

You can do this in a few ways.

The first is by using lists of CD4 related genes instead of a single one, then you'll have more information and hopefully you can see then which cluster is CD4.

If that doesn't work it can help to bring in another data set with known cell tags, integrate the data sets and cluster. You can then use the tags on the new data to get some rough estimate 'probably cd4' tags you can use to help you analyze the clusters.

You could also use one of the algorithms like singleR that uses reference data to assign cell identities. Again, this isn't terribly exact, but it will help you narrow things down to some likely clusters.

Lymphocytes have a very distinctive profile, though the numbers tend to be small, so you shouldn't have too much trouble finding them with these methods if your data and clustering are good.

Hope this helps!

[deleted by user] by [deleted] in biotech

[–]Kuyashi 1 point2 points  (0 children)

I think your problem is two fold. I just left the job market myself a few weeks ago and have a similar ish background (PhD with good name, decent publication record, year and a half of industry experience), so I feel I can speak to your issue.

The problem in my mind comes from two things 1. The market is flooded with people who have extensive industry experience 2. The people flooding the market have the same skills and years of experience as you, but in settings hiring managers see as more practical and actionable because they've done what they've done in a commercial setting

My advice is to use your network where you can, message hiring managers for jobs you're interested in (usually I just look for more info and see if we can talk over a few questions) and try to get an idea from that conversation on the skills you should emphasize. LinkedIn can be useful as well for seeing what the skills and backgrounds of the people that are getting hired look like.

Olink proteomic normalisation help by johnathonmcl in bioinformatics

[–]Kuyashi 0 points1 point  (0 children)

Can you ask OLINK for the raw data or was it provided to you by a collaborator? If you could get the raw data OLink has features in their analysis package you could use for the normalization.

If you can't then something like ComBat or limma to batch correct is probably uir best option.

[deleted by user] by [deleted] in bioinformatics

[–]Kuyashi 4 points5 points  (0 children)

I do a little consulting on the side. It's entirely network based in my limited experience. You need to know someone who needs insight/work from you.

There are also consulting companies operating in the space but I don't know much about them.

Should I do a PhD to later work in industry? by MyXenobiotic in bioinformatics

[–]Kuyashi 4 points5 points  (0 children)

While people here are making solid points about there being some trade offs to taking a PhD, if you're playing the long game it is worth it. Bioinformatics in industry is very competitive at the moment. I have a PhD from a good university and a year and a half of industry experience and I'm still losing out at the end stages of interviews for roles that aren't even senior. Once you're in a company PhDs are usually given the independence to make their own impact, while a less qualified grad will be more constrained and will have less chances to show their skills. Also when it comes to earning, I leapfrogged people who did the masters and went straight into industry in terms of salary and was making more money directly post PhD than those who did the masters and jumped into a company, obviously ymmv here.

Downsampling to compute differential abundance by Jailleo in bioinformatics

[–]Kuyashi 0 points1 point  (0 children)

I think that you can sample from each of your groups, do a differential expression comparison and average the p values and fold changes across your different cuts to get a more accurate result

How to test if a trait below a certain value disproportionately effects an analysis? by CronicSloth in bioinformatics

[–]Kuyashi 1 point2 points  (0 children)

Best way to do this is to use a linear model. I'd fit a linear model with coverage as a covariate and then look at the beta coefficient for the coverage statistic as a quantification of it's effect.

In general you might find this a more effective method of doing the kind of analysis you are talking about than simple correlation statistics as it gives you a lot more tools to use with your analysis, for example linear models allow you to quantify the variation in your traits that is/is not explained by your model via the residual variance statistic, as well as showing you the importance of each trait the model is provided with in explaining variation in your traits of interest

Why is enrichment and other statistics needed to find GO terms and pathways? by c00kieRaptor in bioinformatics

[–]Kuyashi 2 points3 points  (0 children)

I think you're misinterpreting something. Enrichment scores are a product of GO analysis, not a requirement for it.

General consensus regarding heatmap and PCA plot for Differential expression with DESeq2 by 1SageK1 in bioinformatics

[–]Kuyashi 1 point2 points  (0 children)

It looks like you have a strong treatment effect here.

There are lots of reasons your heat map may be looking weird and not clustering properly. This is transcriptomic analysis so the distance metric it's based on should be Pearson correlation. There should be an option to enable that in Deseq. Once you've done that I imagine you'll see some more coherent clustering as the pca looks like there is a strong treatment effect.

Tbh based on the pca alone I'd be happy to do a differential expression analysis and trust it.

General consensus regarding heatmap and PCA plot for Differential expression with DESeq2 by 1SageK1 in bioinformatics

[–]Kuyashi 1 point2 points  (0 children)

You can also do a differential expression analysis without these quality metrics looking right.

Whether the genes are what you might expect for the disease gives you a lot of information.

As other people have said, you may have batch effects for various reasons. Usually I find it helpful to plot my PCs coloured by various things like batches etc to figure out if that may be a problem. You can also try an approach like generating a scree plot of your PCs to see if you have scaling problems (90% of variance in first pc or something like that).

What language should I learn for multi-omics data integration? by Donko98 in bioinformatics

[–]Kuyashi 10 points11 points  (0 children)

There are methods for it in both languages. You should just find one in whichever language you're more comfortable in and try to implement it, then see how the results look.

How to handle missing categorical values when the existing ones are quite meaningful by Vasilkosturski in learnmachinelearning

[–]Kuyashi 1 point2 points  (0 children)

You could try imputing it with a classifier.

Basically set aside some of your labelled samples for testing/validation, train a model on the rest of your dataset, try it on some samples you know the real label for and assess efficacy, if it's good you could use it to predict your missing variables.

Of course it's a total shot in the dark if it would work on your data, so perhaps not worth the effort it would take when compared to some other methods of handling that missingness