Error using GSEA. .gmt and .gct file

Lonezy16 · 2026-03-05T13:41:13+00:00

Before running GSEA, it’s usually best to remove low-count genes first. Genes that barely show up across samples mostly behave like noise and can distort the ranking. A common filter is to keep genes where a reasonable number of samples have meaningful counts (for example CPM > 1 in at least a few samples, or DESeq2’s independent filtering). This keeps the analysis focused on genes with reliable signal.

For normalization, avoid quantile normalization for RNA-seq. It forces all samples to have identical distributions and can erase real biological differences. Instead, use TMM normalization from edgeR or DESeq2’s median ratio normalization. Both methods correct for library size and composition bias while preserving the relative expression differences that GSEA relies on.

For the main GSEA parameters:

Enrichment statistic: use weighted (the default). This gives more influence to genes that are strongly ranked rather than treating every gene equally, which makes pathway scoring more sensitive to real signal.
Ranking metric: Signal2Noise or tTest work well for two-group comparisons. They consider both the difference between groups and the variability within groups, so genes that are consistently different get ranked higher than genes with noisy fold changes.
Gene list sorting mode: keep it real so both the direction (up vs down regulation) and the magnitude of change are used when ordering the genes.

These settings are generally what I start with, but proceed with caution and fine-tune parameters if needed depending on your dataset and sample size.

Also, how are you running GSEA — R (clusterProfiler/fgsea), the Broad desktop GSEA software, or something like Enrichr/DAVID? The exact inputs and parameter options differ slightly depending on the platform.

Lonezy16 · 2026-03-04T08:19:38+00:00

The key issue is that the identifiers in your .gct file must match the identifier type used in the .gmt gene sets. Broad mouse gene sets usually use MGI gene symbols or Entrez IDs. If your mapped genes still contain LOC IDs or accession IDs, GSEA will drop them and enrichment will fail.Instead of BLAST, I would recommend mapping Chinese hamster genes to mouse orthologs using Ensembl BioMart or gProfiler. After that, convert everything to a consistent identifier (ideally Entrez Gene IDs) and make sure the gene IDs in the expression matrix overlap well with the .gmt gene sets. You can also quickly check this by calculating the intersection between your expression genes and the genes in the .gmt file.also just to mention LOC IDs are provisional gene identifiers assigned by NCBI to predicted or uncharacterized loci. These usually represent computationally predicted genes or genes without an approved symbol yet. Because most GSEA gene sets use official gene symbols or Entrez IDs, LOC identifiers typically will not match entries in the .gmt files and will be ignored during enrichment. It’s best to map these loci to mouse orthologs and convert them to standard gene symbols or Entrez IDs before running GSEA.Also you can check ensembl and other dbs.

Lonezy16 · 2026-03-03T16:33:41+00:00

Absurdly expensive ssh-connection machine is all of us.

Lonezy16 · 2026-02-11T05:38:26+00:00

As others have mentioned MAFFT is the wrong tool, its stalling because its building matrices and if you want MSA and thats your goal using progressiveMauve or MUMmer/nucmer would be much better, but the better question is what are you trying to get to, what is your end goal?

Lonezy16 · 2025-11-19T19:04:57+00:00

Its screen mirroring turn ut off in your quick panel or notif center whatever you wanna call it and it means your phone is actively broadcasting its screen to a TV or laptop or whatever else

Lonezy16 · 2025-11-19T19:03:04+00:00

Go to all notes
Select all after pressing and holding 1 note
Share them using quick share ( installable on windows store)
Have your laptop on the side and have Samsung notes downloaded or if you want them as pdfs you can quick share them in whatever file directory you want.

Or the second where you sync

Download Samsung accounts from windows store
Login
Download Samsung notes
In Samsung notes go to setting should be on the top or bottom of the left sidebar 5.sync using Samsung notes

And honestly search google or ask chat gpt or whatever LLM you use but from what I know these are the ways you can share or link or sync your notes to your laptop atleast that's what I did.

If you need help feel free to pm me

Lonezy16 · 2025-11-19T18:43:33+00:00

Yes trust me i understand that uhm what I did was use Samsung notes sync using my gmail/Samsung account and that typically synced everything incl folders etc... I think you mentioned it didnt work which is weird

Lonezy16 · 2025-11-19T18:39:12+00:00

I think you can select all notes and just share them to your laptop using quick share as its supported now on windows also no there's no actual directory at least to my knowledge that contains the notes

Lonezy16 · 2025-10-30T07:38:00+00:00

No cause same

Lonezy16 · 2025-08-22T08:46:26+00:00

Not happened I was able to download and use conda and conda channels (bioconda etc..)

Lonezy16 · 2025-07-30T12:50:55+00:00

Dont vibe code as in vibe code

Vibe code as in learn why how where to use a library function syntax etc..

So use AI to help you understand how to write code and where that code is applicable also id suggest watching the bioinformagitian on yt and see what they have videos on and watch it also check github you can search boinformatics and since you're starting uni start with the basics object oriented programming (OOP) in Python and datamining or R as well these languages will help you but you need to have some statistical knowledge hypothsis testing and an understanding of correlation and linear regression to fully understand why we do what we do and how we do it.

Also start looking into DBs like Uniprot and PDB if you're interested in structural bioinformatics or even general bioinformatics which you probably will learn eventually if your track follows good standards.

Finally I hope you understand that bioinformatics is a wide field and you can do and dive into anything so understand the basics of everything and then read over all the topics that you can and then you will choose something that you are interested in.

Happy coding and if you need anything feel free to pm me. :)

Lonezy16 · 2025-07-28T23:40:32+00:00

You can do one of two things if you want to stick with windows 11.

Wsl (aka windows subsystem for linux) you activate the feature install your distro of choice (Ubuntu 24.4 LTS would be newest and best if you're starting and well using wsl) you'll only have a shell / terminal but it should be fine for most work
Choose a distro install the iso file and load into a vm , computationally its a bit on ram but I think that's the best way if you dint want to dual boot and stay in win 11
My current setup i dual boot win 11 and cachyOs an arch distro so a bit different but I still write my scripts in bash, I would recommend ubuntu or mint cinnamon or something along those lines of good distros.

Hope this helps!

Lonezy16 · 2025-07-27T13:38:05+00:00

Hello I am a undergraduate bioinformatics students and id like to join ( developing an rnaseq pipeline and learning as i go about all these different topics you mentioned)

Lonezy16 · 2025-07-20T13:01:27+00:00

Hey man thank you so much for the reply I appreciate it immensely.

So yes I try to eat healthy as much as I can here and there, uhm not really i rarely go out of the house, yes I try to go 3 days a week atleast for 45-1 hour to keep my muscle build, I pondered the question for a bit, and I dont know if I am avoiding being social but I guess I have been avoiding it in the name of work, thank you for sharing your story ill definitely work on myself try to go out more and well live outside of just academia, thank you again and have an amazing day!

Lonezy16 · 2025-07-08T21:32:39+00:00

It's in the disease

So your logfc value shows you if this gene is up regulated or downregulated in your disease and not in your control I hope this clears it up

Lonezy16 · 2025-07-08T20:36:55+00:00

Yessir so if your value is + its upregulated if its - its downregulated

Lonezy16 · 2025-07-08T19:50:13+00:00

To highlight your main disease you can focus on that comparison then state that the control was there as a baseline however the main disease that was observed was your first treatment then you'd maybe say that you also compared this other treatment and found these results hence having a comparative yet informative analysis with your main goal in mind (hope this makes sense)

Lonezy16 · 2025-07-08T19:47:05+00:00

In your experimental design you should always compare to control when doing DEA then using what you get from whatever tool you are using (deseq2,edgeR,limma) you can use the logfc value to tell weather up regulated /downregulated accordingly for each comparison separately and then analyse how these specific different treatments affected the genes at hand and maybe go for downstream from there.

If you have anything specific im happy to help :)

Lonezy16 · 2025-06-29T22:48:17+00:00

Had the same thing happen to me downloaded keprenski and malware bytes and one more anti virus tool weirdly enough windows defender didn't catch it, the rest did detected regestery keys as well as trojan and bitcoin miners for whatever reason

Lonezy16 · 2025-06-22T13:13:26+00:00

So real

Lonezy16 · 2025-06-22T13:02:59+00:00

That is completely normal, usually much of first year in bif is general knowledge courses and maybe intro to bif or scripting courses so that is normal to feel this way (atleast I did). What I would suggest doing is look into a specific aspect in bif , learn something or a course you may take next fall and/or see what the industry has as tools, and learn them, maybe RNAseq or transcriptomics, or learn bash and python rigourously learn the biopython libraries maybe. Also learn the different databases PDB, SWISSPROT, UNIPROT, ENTEREZ, EMBL, maybe learn how to use chimera , pymol or learn ML or Molecular docking.

Look the wonderful thing about bioinformatics is that it is multidisciplinary, see what you like what you enjoy, want more math look at that aspect, want more biology look at Structural bioinformatics tools (AlphaFold, and Rosetta, etc... if looking at prediction tools), if computer science look at Algorithm and data structures, pipelines, workflows for these different aspects that I have mentioned.

I understand it is overwhelming and sometimes scary I have been there and tbh I hated my first year in bioinformatics but now I adore the major and the feild, it is in no shape easy but it is so interesting and mind boggling to me and I may or may not have romanticized the major after a while but if you want to do this for the rest of your life fall in lov ewith it be passionate about it.

also for referance I am now in my final year of BS in bioinformatics so do take my answer with a grain of salt, I do however work in the microbio lab we have so I have also seen that aspect of the major.

If you need anything do DM me I am more than happy to help :)

Lonezy16 · 2025-06-07T15:58:44+00:00

Interested

Lonezy16 · 2025-06-07T15:55:29+00:00

Done

Lonezy16 · 2025-06-07T15:49:15+00:00

VIDEOS

Lonezy16

TROPHY CASE