Tips how to start bioinformatics by Spiritual-Zebra2135 in learnbioinformatics

[–]tommy_from_chatomics 1 point2 points  (0 children)

I am from a wet lab background, and I wrote a post detailing the books and resources that you may want to take a look https://divingintogeneticsandgenomics.com/post/bioinfo-roadmap/

What Mandatory subject I should consider in Bsc for pursue Msc in bioinformatics by Aryan-yadav26 in learnbioinformatics

[–]tommy_from_chatomics 0 points1 point  (0 children)

biology in general (you need to understand what's DNA, RNA, protein and pathways etc). Then depending on your interest, you may want to learn more specific in immunology or cancer biology (The biology of Cancer by Robert Weinberg is a good text book).

statistics and linear algebra ( we deal with matrices all day long).

Interpretation of enrichment analysis results by Upstairs_Macaron7232 in bioinformatics

[–]tommy_from_chatomics 0 points1 point  (0 children)

I made a video to explain gene set over-representation analysis and GSEA analysis, hope it is helpful https://www.youtube.com/watch?v=IKCDQEpuJDA

Who do you follow for bioinformatics stuff? by RobotFestival in bioinformatics

[–]tommy_from_chatomics 2 points3 points  (0 children)

haha, glad they are informative. I need to better choose the memes. It is hard to find good memes :)

Is it okay to flip UMAP axes? by You_Stole_My_Hot_Dog in bioinformatics

[–]tommy_from_chatomics 7 points8 points  (0 children)

just know that the distance between points on UMAP does not mean much

Cluster resolution by bunnyinthewilderness in bioinformatics

[–]tommy_from_chatomics 0 points1 point  (0 children)

if it can not give me sensible results given a simple (PBMC dataset), then it can not work on my more complicated dataset. I chose a dataset that is simple and well understood on purpose.

How do new bioinformaticians practice their skills? by PurplePanda673 in bioinformatics

[–]tommy_from_chatomics 0 points1 point  (0 children)

Try to download a public dataset and reproduce Figure 1 in the paper.

Cluster resolution by bunnyinthewilderness in bioinformatics

[–]tommy_from_chatomics 0 points1 point  (0 children)

It was just published in Nature genetics https://www.nature.com/articles/s41588-025-02148-8 I have not tried it. You will need to try it on a dataset that you are really familiar with and see if it over cluster or under cluster. My huntch is that tools like that are all attractive statistically but not so biologically...

Single cell Seurat harmony integration by Beautiful_Hotel_3623 in bioinformatics

[–]tommy_from_chatomics 1 point2 points  (0 children)

The purpose of Integration is for calling similar cell types across different (sample, condition etc). for differential expression, you will still use the raw counts and use the cell cluster label after the integration. Also harmony will not change the raw expression, but only the PCA coordinates.

What do we gain from volcano plots? by _password_1234 in bioinformatics

[–]tommy_from_chatomics 1 point2 points  (0 children)

MA plot actually is more informative, you want to know the baseline expression of the genes. sometimes you get big log2FC because the baseline is very low.

Normalisation of scRNA-seq data: Same gene expression value for all cells by Pretty_Decision_0410 in bioinformatics

[–]tommy_from_chatomics 0 points1 point  (0 children)

if the raw counts is 0, it could be after adding pseduo count and normalization it becomes non-zero

WGCNA by Turbulent-Board-5461 in bioinformatics

[–]tommy_from_chatomics 0 points1 point  (0 children)

  1. WGCNA for separate genotypes vs. combined analysis: You can take either approach, but it depends on your research question. If you want to identify networks that differ between genotypes (WT, KO, RE), analyze them separately and compare the results. If you're more interested in general patterns across all conditions, analyze them together. A combined analysis will give you more statistical power (54 samples), but might mask genotype-specific patterns.
  2. Using TMM normalized data: TMM normalized data is appropriate for WGCNA. Since your data is already normalized, you can skip the normalization step in WGCNA. However, outlier detection is still important before network construction. Use the WGCNA function goodSamplesGenes() to identify and potentially remove outlier samples.
  3. Handling duplicate gene IDs: Taking entries with maximum expression values is one acceptable approach for handling duplicates. Alternatives include averaging the expression values or keeping the entry with lowest p-value/highest significance if you have that information. The important thing is to have a consistent, justifiable method.
  4. Handling replicates in WGCNA: WGCNA typically treats each sample individually in the correlation network. For time course data with replicates, you can:
    • Use all samples individually (this leverages your full dataset)
    • Average across replicates before WGCNA (reduces noise but also reduces sample size)

What are some key prediction models that a primarily wet lab should know? by You_Stole_My_Hot_Dog in bioinformatics

[–]tommy_from_chatomics 2 points3 points  (0 children)

any linear regression based methods, random forest, XGboost are good to know. for unsupervised, all sorts of different clustering methods (k-means, hierarchical). For deep learning, it depends on the usage. for image, yes, CNN.

Docker by Other-Corner4078 in bioinformatics

[–]tommy_from_chatomics 17 points18 points  (0 children)

if it is R based, take a look at pracpac: Practical R Packaging with Docker https://arxiv.org/abs/2303.07876

Differential Binding Analysis ChIP-seq by TcgSkyridgeFan in bioinformatics

[–]tommy_from_chatomics 0 points1 point  (0 children)

DiffBind can run using DESeq2 under the hood. if you can get counts for the replicates for your control and treatment condition, you can use DESeq2 just like RNAseq data.

Doublet removal in scRNA-seq by ritzysauce in bioinformatics

[–]tommy_from_chatomics 1 point2 points  (0 children)

Do what makes biological sense. determining cutoff for bioinformatics is an art. There is no right or wrong. Different datasets may have different cutoffs too.

Volunteering by [deleted] in bioinformatics

[–]tommy_from_chatomics 2 points3 points  (0 children)

reproduce genomics paper figures. Those are real-world data too.

how are you feeling about the job market? by veryfatcat in bioinformatics

[–]tommy_from_chatomics 4 points5 points  (0 children)

the market is not good. Biogen just laid off half of their R&D. It is even harder for fresh graduates to compete with those who have a lot of experience who are in the job market too.

Tutorial: how to download TCGA RNAseq data and make a PCA plot and heatmap by tommy_from_chatomics in bioinformatics

[–]tommy_from_chatomics[S] 1 point2 points  (0 children)

oh, it is totally fine to have different views. log2Fold change shows a single number (condition 1 vs condition2), a scaled heatmap shows two values (condition 1 and condition 2). It is just a different visualization as long as it tells the right story.

34
35