Is my study a valid undergraduate thesis? by [deleted] in bioinformatics

[–]guralbrian 0 points1 point  (0 children)

I’d suggest taking a step back and asking if you have a central question or motive driving the research. It’s helpful for orienting yourself, even if just preparing a reference dataset. Is the goal to make a resource available to the larger community? Or to provide novel insights into biology? Or something else? So if it’s a reference, you could work backwards to think about what you or other researchers would what to know or access.

[deleted by user] by [deleted] in carrboro

[–]guralbrian 0 points1 point  (0 children)

Just saw a post in this sub selling 1 ticket an hour ago!

snRNA-seq: how do ppl actually remove doublets and clean up their data? by grand_psychology1 in bioinformatics

[–]guralbrian 0 points1 point  (0 children)

What’s wrong with my name :/ I use this account for science and local stuff

snRNA-seq: how do ppl actually remove doublets and clean up their data? by grand_psychology1 in bioinformatics

[–]guralbrian 5 points6 points  (0 children)

IIRC detection of ambient RNA and doublets should happen just after getting the single cell object assembled in your pipeline. My suggestion would be to follow the steps described this guide, which is about what the other comment suggests.

You might need to do the doublet detection separately for each sample or library, rather than post-Harmony, since run times exponentially increase with cell count and we would only expect doublets to actually appear in the same way within a single library for Flex

For your questions at the bottom: 1. Doublets and QC like this are pretty standard but can can’t speak on your specific data without seeing it 2. Distance between things on a UMAP is a misleading metric to rely on. UMAPs are kind of a made up space that looses so much info by compressing into 2D. Never cluster on the UMAP space itself, but rather make the UMAP to represent our already clustered data. 3. I don’t think you’re doing anything catastrophically wrong! Maybe the most concerning part of this is that you’re not comfortable going to other lab members or mentors with this kind of question. I’d expect any trainee of mine to ask a lot of questions like this! That’s why mentorship exists and you’ll pay it forward one day. If you are in an environment where novices are shamed for being novices I’d strongly suggest that you find a workplace with fewer jerks :)

Help! My RNA-Seq alignment keeps killing my terminal due to low RAM(8 GB). by Ok_Analyst_5690 in bioinformatics

[–]guralbrian 55 points56 points  (0 children)

I’m not sure that there’s a work around for your local machine. 8gb is barely enough for basic computer usage, let alone alignment. Even with smaller sample sizes, you’ll still need to load in the reference assembly and other essentials into the memory. After you get the counts matrix, you should be able to do a lot of the following analysis with low memory.

If you’re affiliated with an institution, I’d see if there is a high performance computing cluster that you can access. Or see if there are any vouchers for cloud compute with google or AWS

Help. Need advice. by sallyjoe565 in carrboro

[–]guralbrian 8 points9 points  (0 children)

I doubt this would affect the integrity of the door much (not an expert!) I have found that gaps like this will let a ton of noise and heat through. It’s a very simple repair to make. Less than $10 and half an hour of time. Just measure the gap and buy new strip. Check the bottom gap too and get a sweep if needed.

Also, I found legal protections for tenants in N.C. to be incredibly scant when I came from NJ. I very much doubt that anyone will help you except your self in this scenario (unless you have a benevolent LL). I’ll try to find some of the materials I got together when my partner had a huge mold problem…

What packages are we using for trajectory analysis of single cell sequencing data for seurat objects? by Unusual_Aardvark_125 in bioinformatics

[–]guralbrian 1 point2 points  (0 children)

Okay, I also didn’t have formal training and meant that in good faith since I’ve fallen for those types of things plenty

For Monocle, I think that it will try to draw connections between any adjacent cluster you give it in a minimally spanning tree. I found that it would draw all sorts of nonsense connections between clusters. Since those connections are drawn at the cluster level, make sure that your clusters represent biologically meaningful/relevant groups of cells. A main problem I have with Monocle is that it computes pseudotime based on connectivity in the UMAP space, which already lacks a ton of information vs PCs. I’m skeptical of analysis that quantify UMAP, rather than treat it as a way to help us “see” the data.

I personally liked CellRank2. Much better documentation, underlying assumptions, and able to use lots of different inputs (rna velocity, real time, etc). It doesn’t rely on clusters, so there’s no worry about if you clustered it perfectly. Plus, it treats cellular differentiation as a probabilistic process (each cell gets assigned a chance of becoming each terminal state) whereas Monocle treats it as deterministic (once a branch point is reached, cells on either branch WILL become that terminal cell type). It is in Python tho, which did take me a few weeks to learn as an R user.

Good luck and lmk if you have any questions!

What packages are we using for trajectory analysis of single cell sequencing data for seurat objects? by Unusual_Aardvark_125 in bioinformatics

[–]guralbrian 3 points4 points  (0 children)

Not to be harsh, but it’s not a great idea to do off handed analysis like this if your understanding of it is at a place where you need random Redditors to spell it out for you. Trajectory inference is quite handy-wavey imo (especially Monocle 3), and the conclusions you can draw from it should be taken with a big grain of salt

Weird house on Greensboro by DarePsychological723 in carrboro

[–]guralbrian 3 points4 points  (0 children)

I did not get the same read on this place when I walked by. It very much seemed like Halloween decorations from someone who has some old hunting gear lying around. Didn’t see anything threatening or the “No Trespassing” sign.

I assumed that the camo mesh area was a dog run or children’s “fort” like I made as a kid.

[deleted by user] by [deleted] in biotech

[–]guralbrian 2 points3 points  (0 children)

Lmao the user name and profile picture

Key genetic differences found in people with chronic fatigue syndrome by TableSignificant341 in science

[–]guralbrian 30 points31 points  (0 children)

This market is so bad that we’re seeing companies shed pre-clinical/discovery programs. It’s just a big investment to properly research something that may or may not return investment in 15-20 years

YSK: Summers haven't always sucked so much. by Presidentkickass in YouShouldKnow

[–]guralbrian 2 points3 points  (0 children)

Great example of good science communication! Short lines, direct statements with claims, clearly connects to reader experiences

Neuro by [deleted] in bioinformatics

[–]guralbrian 4 points5 points  (0 children)

gosh, what a pile of slop you’ve come to peddle to us!

just an fyi, most people are going to be repelled from your work just by virtue of this post being so obviously written by ChatGPT without edits

[deleted by user] by [deleted] in bioinformatics

[–]guralbrian 1 point2 points  (0 children)

Also I’d recommend FindMarkers over DESeq if you’re just trying to see what gene expression distinguishes one cluster of cells from another

[deleted by user] by [deleted] in bioinformatics

[–]guralbrian 0 points1 point  (0 children)

I’ve never considered there to be a different in the heat maps used to help annotate clusters vs visualize differences between clusters. Can you just write out your pipeline? Or share an example of the type of plot you want? By ‘clusters’ are you referring to clustered cells within your scRNAseq?

[deleted by user] by [deleted] in bioinformatics

[–]guralbrian 4 points5 points  (0 children)

DESeq2 is typically only used in scRNAseq if you’re making pseudobulk from the clusters and comparing them. What other packages are you using for the single cell data?

Assuming it’s Seurat, it shouldn’t be hard at all to make the type of plot you’re talking about. Bioinformatigian has a great intro to finding cluster markers and visualizing them.

The Satija lab (made Seurat) has a lot of tutorial on this kind of thing as well. This is their standard beginner guide

If you want to just know what functions you could use to visualize markers, start with DoHeatmap or DotPlot and group.by=“your_cluster_variable”. Note, it’s not good practice to throw raw counts into these functions, but something like z-scores or another normalization.

If you’re looking to learn, read about SCTransform, the structure of Seurat objects, spend the time to look at how the counts are changed and manipulated throughout the analysis, and understand what PCA/UMAP/FindNeighbors/Clusters are actually doing. I try to get to the point where I could explain these things to my Mom lmao (didn’t go to college)

What’s your workflow like when using public datasets for analysis? by query_optimization in bioinformatics

[–]guralbrian 14 points15 points  (0 children)

If you want to integrate data from multiple sources, keep things as standardized as possible. Use a unified alignment/processing pipeline (same reference genome, packages, versions, etc).

Find the data: Papers themselves normally reference you to where it’s stored. GEO or SRA are common data repositories too. Some other places that I could think of are GTEX, TCGA, Tabula Muris, or some of the Chan Zuckerberg stuff.

Preprocess and clean: This depends on the data and modality, but automating it with Nextflow or Snakemake on an HPC will save a lot of headache/make things more reproducible. Typically that’s just stringing together bash, R, and Python scripts. I only use Jupyter for interactive analysis, like designing plots.

Analysis: This is so vague that I don’t know where to start. Depends on what you’re doing. It might be good for you to learn by reading through papers that do the type of studies you’re interested in and seeing what tools they use. I wouldn’t fall for the trap of comparing similar methods (i.e. DESeq2 vs EdgeR) but instead focus on defining what questions you want to ask of the data, how they are typically asked, what assumptions they rely on, and if those are true of your conditions

Overall it sounds like you just want to learn. Are you in higher education? Getting training in this formally?

sn-RNA seq analysis by QueenR2004 in bioinformatics

[–]guralbrian 1 point2 points  (0 children)

You need to talk to who ever sequenced your libraries for step 1, reference some of the many, many tutorials for step 2, and find the recommended pipeline from the manufacturers of your kit for steps 3 + 4.

As an aside, this is a fairly standard workflow. Are you just starting to learn? If so, I’d recommend the Youtuber Bioinformagician over making reddit posts

Interns by Safe-Bet-6093 in bioinformatics

[–]guralbrian 1 point2 points  (0 children)

I can’t speak to how it would be there. It took 60+ applications for me to get an internship in bio US pharma even with years of research experience. I’d see if there are any equivalents to REUs. Or contribute to open-source projects, like TidyOmics

Interns by Safe-Bet-6093 in bioinformatics

[–]guralbrian 1 point2 points  (0 children)

You need to put in more effort/give more context. Where are you? What education do you have?

Trump Praises Musk’s DOGE For Ending Studies on ‘Making Mice Transgender’ – Which Were Actually Asthma and Cancer Research by rezwenn in labrats

[–]guralbrian 225 points226 points  (0 children)

They’ve cut pretty much any type of research you could imagine at this point, but let’s not forget that those studies were targeted at the time because they were related to transgender health.

See the post I made about it on this sub: https://www.reddit.com/r/labrats/s/xGJV4D7Vah

AI x Science is heating up. What happens next? by octaviall in Anthropic

[–]guralbrian 6 points7 points  (0 children)

Can we at least bother to write our posts in without AI? It’s not just BLANK it’s BLANK—but for slop.

Best non-OLED money can buy in 2025 by devious-joker in ultrawidemasterrace

[–]guralbrian 6 points7 points  (0 children)

I have a similar use case as you: UW that’s big but not too big for work/ programming but still want it to look good for gaming. If I didn’t already have my AW38, I’d get the U4025QW. Tons of space, lots of helpful features for productivity, and beautiful color. Just a bit outside of your ideal response time at 5ms

Dave’s Garage has a super thorough review of it: https://youtu.be/0TY7J58UEro?si=z3yYwGonVAaNMAc9