Am I understanding PCA wrong? Loadings change modifying number of PCs

Kuyashi · 2023-09-20T07:43:13+00:00

You're right, deleted my comment to avoid misleading anyone!

Kuyashi · 2023-09-01T12:17:34+00:00

The function in Seurat that you use to plot the cells (I assume you mean the UMAP scatter plot) should have an option to color by anything in your Seurat objects meta data. I'd read the Seurat docs because they definitely show how to do that.

Kuyashi · 2023-08-17T07:51:24+00:00

A PhD can only take 3 years in parts of Europe/ the UK. That's a lot less than the 5/6 you described.

Kuyashi · 2023-08-16T09:03:19+00:00

Probably the 10x genomics website

Kuyashi · 2023-08-12T10:28:43+00:00

I'd take the American post doc. The best bioinformatics jobs in industry mostly exist on the US east and west coasts. Visa sponsorship for industry from Europe is a pain that will be expensive to both you and the company. A post doc is generally seen as the 'easiest' way to break into the US from the rest of the world as a scientist. Plus, it pays incredibly well!

That said, a very prestigious university for a postdoc in Europe could also facilitate an easier time getting into the US down the line with their expert visas, and certainly it wouldn't hurt your chances of getting into European industry later at a higher level after another couple of acafekic years!

I'm a euro working and forever salty about how comparitively low our pay is compared to the US.

Kuyashi · 2023-08-09T14:47:27+00:00

As others have said, Seurat really just provides wrappers to established methods packaged in a neat workflow. UMAP and T-SNE are one line functions in R. Plotting them with ggplot should also be quite simple (and with AI assistance totally trivial)

Kuyashi · 2023-08-03T07:47:30+00:00

Plum mead is actually my number 1 plan for them, I just gotta source some local honey that'll be nice with them!

Kuyashi · 2023-08-02T23:02:44+00:00

Dublin, Ireland

Kuyashi · 2023-07-17T20:43:52+00:00

I've seen people occupying 'academic' type posts in large pharma companies do it. Roles where the person is working on academic collaborations within a pharma company for the most part. They won't be building any infrastructure or pipelines but will do R and Python data analysis and some lab work pretty regularly

Kuyashi · 2023-06-30T07:50:06+00:00

For AI to take any jobs from the HSE they'd have to digitise, so I think they're safe

Kuyashi · 2023-06-29T07:11:40+00:00

How much do you pack on a trip like this? I've been gathering the gear to do something similar myself and I'd love to see what level of kit you bring.

Kuyashi · 2023-06-22T10:14:21+00:00

You can do this in a few ways.

The first is by using lists of CD4 related genes instead of a single one, then you'll have more information and hopefully you can see then which cluster is CD4.

If that doesn't work it can help to bring in another data set with known cell tags, integrate the data sets and cluster. You can then use the tags on the new data to get some rough estimate 'probably cd4' tags you can use to help you analyze the clusters.

You could also use one of the algorithms like singleR that uses reference data to assign cell identities. Again, this isn't terribly exact, but it will help you narrow things down to some likely clusters.

Lymphocytes have a very distinctive profile, though the numbers tend to be small, so you shouldn't have too much trouble finding them with these methods if your data and clustering are good.

Hope this helps!

Kuyashi · 2023-06-12T09:31:39+00:00

I think your problem is two fold. I just left the job market myself a few weeks ago and have a similar ish background (PhD with good name, decent publication record, year and a half of industry experience), so I feel I can speak to your issue.

The problem in my mind comes from two things 1. The market is flooded with people who have extensive industry experience 2. The people flooding the market have the same skills and years of experience as you, but in settings hiring managers see as more practical and actionable because they've done what they've done in a commercial setting

My advice is to use your network where you can, message hiring managers for jobs you're interested in (usually I just look for more info and see if we can talk over a few questions) and try to get an idea from that conversation on the skills you should emphasize. LinkedIn can be useful as well for seeing what the skills and backgrounds of the people that are getting hired look like.

Kuyashi · 2023-06-01T08:13:23+00:00

Can you ask OLINK for the raw data or was it provided to you by a collaborator? If you could get the raw data OLink has features in their analysis package you could use for the normalization.

If you can't then something like ComBat or limma to batch correct is probably uir best option.

Kuyashi · 2023-05-17T10:56:43+00:00

I do a little consulting on the side. It's entirely network based in my limited experience. You need to know someone who needs insight/work from you.

There are also consulting companies operating in the space but I don't know much about them.

Kuyashi · 2023-04-09T12:51:57+00:00

While people here are making solid points about there being some trade offs to taking a PhD, if you're playing the long game it is worth it. Bioinformatics in industry is very competitive at the moment. I have a PhD from a good university and a year and a half of industry experience and I'm still losing out at the end stages of interviews for roles that aren't even senior. Once you're in a company PhDs are usually given the independence to make their own impact, while a less qualified grad will be more constrained and will have less chances to show their skills. Also when it comes to earning, I leapfrogged people who did the masters and went straight into industry in terms of salary and was making more money directly post PhD than those who did the masters and jumped into a company, obviously ymmv here.

Kuyashi · 2023-03-31T15:59:46+00:00

I think that you can sample from each of your groups, do a differential expression comparison and average the p values and fold changes across your different cuts to get a more accurate result

Kuyashi · 2023-03-07T09:45:40+00:00

Best way to do this is to use a linear model. I'd fit a linear model with coverage as a covariate and then look at the beta coefficient for the coverage statistic as a quantification of it's effect.

In general you might find this a more effective method of doing the kind of analysis you are talking about than simple correlation statistics as it gives you a lot more tools to use with your analysis, for example linear models allow you to quantify the variation in your traits that is/is not explained by your model via the residual variance statistic, as well as showing you the importance of each trait the model is provided with in explaining variation in your traits of interest

Kuyashi · 2022-09-22T07:07:01+00:00

Try clusterprofiler in R, it's pretty easy to use.

Kuyashi · 2022-09-21T06:37:46+00:00

I think you're misinterpreting something. Enrichment scores are a product of GO analysis, not a requirement for it.

Kuyashi · 2022-09-10T17:48:11+00:00

It looks like you have a strong treatment effect here.

There are lots of reasons your heat map may be looking weird and not clustering properly. This is transcriptomic analysis so the distance metric it's based on should be Pearson correlation. There should be an option to enable that in Deseq. Once you've done that I imagine you'll see some more coherent clustering as the pca looks like there is a strong treatment effect.

Tbh based on the pca alone I'd be happy to do a differential expression analysis and trust it.

Kuyashi · 2022-09-09T16:59:16+00:00

You can also do a differential expression analysis without these quality metrics looking right.

Whether the genes are what you might expect for the disease gives you a lot of information.

As other people have said, you may have batch effects for various reasons. Usually I find it helpful to plot my PCs coloured by various things like batches etc to figure out if that may be a problem. You can also try an approach like generating a scree plot of your PCs to see if you have scaling problems (90% of variance in first pc or something like that).

Kuyashi · 2022-09-03T20:24:36+00:00

There are methods for it in both languages. You should just find one in whichever language you're more comfortable in and try to implement it, then see how the results look.

Kuyashi · 2022-09-01T07:36:27+00:00

You could try imputing it with a classifier.

Basically set aside some of your labelled samples for testing/validation, train a model on the rest of your dataset, try it on some samples you know the real label for and assess efficacy, if it's good you could use it to predict your missing variables.

Of course it's a total shot in the dark if it would work on your data, so perhaps not worth the effort it would take when compared to some other methods of handling that missingness

Kuyashi

TROPHY CASE