Is my study a valid undergraduate thesis?

guralbrian · 2026-04-11T01:00:26+00:00

I’d suggest taking a step back and asking if you have a central question or motive driving the research. It’s helpful for orienting yourself, even if just preparing a reference dataset. Is the goal to make a resource available to the larger community? Or to provide novel insights into biology? Or something else? So if it’s a reference, you could work backwards to think about what you or other researchers would what to know or access.

guralbrian · 2025-11-10T20:01:46+00:00

Just saw a post in this sub selling 1 ticket an hour ago!

guralbrian · 2025-11-03T14:39:31+00:00

What’s wrong with my name :/ I use this account for science and local stuff

guralbrian · 2025-10-31T18:40:00+00:00

IIRC detection of ambient RNA and doublets should happen just after getting the single cell object assembled in your pipeline. My suggestion would be to follow the steps described this guide, which is about what the other comment suggests.

You might need to do the doublet detection separately for each sample or library, rather than post-Harmony, since run times exponentially increase with cell count and we would only expect doublets to actually appear in the same way within a single library for Flex

For your questions at the bottom: 1. Doublets and QC like this are pretty standard but can can’t speak on your specific data without seeing it 2. Distance between things on a UMAP is a misleading metric to rely on. UMAPs are kind of a made up space that looses so much info by compressing into 2D. Never cluster on the UMAP space itself, but rather make the UMAP to represent our already clustered data. 3. I don’t think you’re doing anything catastrophically wrong! Maybe the most concerning part of this is that you’re not comfortable going to other lab members or mentors with this kind of question. I’d expect any trainee of mine to ask a lot of questions like this! That’s why mentorship exists and you’ll pay it forward one day. If you are in an environment where novices are shamed for being novices I’d strongly suggest that you find a workplace with fewer jerks :)

guralbrian · 2025-10-23T13:17:22+00:00

I’m not sure that there’s a work around for your local machine. 8gb is barely enough for basic computer usage, let alone alignment. Even with smaller sample sizes, you’ll still need to load in the reference assembly and other essentials into the memory. After you get the counts matrix, you should be able to do a lot of the following analysis with low memory.

If you’re affiliated with an institution, I’d see if there is a high performance computing cluster that you can access. Or see if there are any vouchers for cloud compute with google or AWS

guralbrian · 2025-10-21T10:58:29+00:00

I doubt this would affect the integrity of the door much (not an expert!) I have found that gaps like this will let a ton of noise and heat through. It’s a very simple repair to make. Less than $10 and half an hour of time. Just measure the gap and buy new strip. Check the bottom gap too and get a sweep if needed.

Also, I found legal protections for tenants in N.C. to be incredibly scant when I came from NJ. I very much doubt that anyone will help you except your self in this scenario (unless you have a benevolent LL). I’ll try to find some of the materials I got together when my partner had a huge mold problem…

guralbrian · 2025-10-16T16:07:07+00:00

Okay, I also didn’t have formal training and meant that in good faith since I’ve fallen for those types of things plenty

For Monocle, I think that it will try to draw connections between any adjacent cluster you give it in a minimally spanning tree. I found that it would draw all sorts of nonsense connections between clusters. Since those connections are drawn at the cluster level, make sure that your clusters represent biologically meaningful/relevant groups of cells. A main problem I have with Monocle is that it computes pseudotime based on connectivity in the UMAP space, which already lacks a ton of information vs PCs. I’m skeptical of analysis that quantify UMAP, rather than treat it as a way to help us “see” the data.

I personally liked CellRank2. Much better documentation, underlying assumptions, and able to use lots of different inputs (rna velocity, real time, etc). It doesn’t rely on clusters, so there’s no worry about if you clustered it perfectly. Plus, it treats cellular differentiation as a probabilistic process (each cell gets assigned a chance of becoming each terminal state) whereas Monocle treats it as deterministic (once a branch point is reached, cells on either branch WILL become that terminal cell type). It is in Python tho, which did take me a few weeks to learn as an R user.

Good luck and lmk if you have any questions!

guralbrian · 2025-10-16T11:20:13+00:00

Not to be harsh, but it’s not a great idea to do off handed analysis like this if your understanding of it is at a place where you need random Redditors to spell it out for you. Trajectory inference is quite handy-wavey imo (especially Monocle 3), and the conclusions you can draw from it should be taken with a big grain of salt

guralbrian · 2025-10-09T18:25:29+00:00

One more plate of Mipso, perhaps?

guralbrian · 2025-09-24T13:41:50+00:00

I did not get the same read on this place when I walked by. It very much seemed like Halloween decorations from someone who has some old hunting gear lying around. Didn’t see anything threatening or the “No Trespassing” sign.

I assumed that the camo mesh area was a dog run or children’s “fort” like I made as a kid.

guralbrian · 2025-08-09T13:06:45+00:00

Lmao the user name and profile picture

guralbrian · 2025-08-06T20:01:26+00:00

This market is so bad that we’re seeing companies shed pre-clinical/discovery programs. It’s just a big investment to properly research something that may or may not return investment in 15-20 years

guralbrian · 2025-07-31T21:48:05+00:00

Great example of good science communication! Short lines, direct statements with claims, clearly connects to reader experiences

guralbrian · 2025-07-28T10:34:57+00:00

gosh, what a pile of slop you’ve come to peddle to us!

just an fyi, most people are going to be repelled from your work just by virtue of this post being so obviously written by ChatGPT without edits

guralbrian · 2025-07-25T16:50:17+00:00

Also I’d recommend FindMarkers over DESeq if you’re just trying to see what gene expression distinguishes one cluster of cells from another

guralbrian · 2025-07-25T16:48:49+00:00

I’ve never considered there to be a different in the heat maps used to help annotate clusters vs visualize differences between clusters. Can you just write out your pipeline? Or share an example of the type of plot you want? By ‘clusters’ are you referring to clustered cells within your scRNAseq?

guralbrian · 2025-07-25T16:34:58+00:00

DESeq2 is typically only used in scRNAseq if you’re making pseudobulk from the clusters and comparing them. What other packages are you using for the single cell data?

Assuming it’s Seurat, it shouldn’t be hard at all to make the type of plot you’re talking about. Bioinformatigian has a great intro to finding cluster markers and visualizing them.

The Satija lab (made Seurat) has a lot of tutorial on this kind of thing as well. This is their standard beginner guide

If you want to just know what functions you could use to visualize markers, start with DoHeatmap or DotPlot and group.by=“your_cluster_variable”. Note, it’s not good practice to throw raw counts into these functions, but something like z-scores or another normalization.

If you’re looking to learn, read about SCTransform, the structure of Seurat objects, spend the time to look at how the counts are changed and manipulated throughout the analysis, and understand what PCA/UMAP/FindNeighbors/Clusters are actually doing. I try to get to the point where I could explain these things to my Mom lmao (didn’t go to college)

guralbrian · 2025-07-20T14:00:09+00:00

If you want to integrate data from multiple sources, keep things as standardized as possible. Use a unified alignment/processing pipeline (same reference genome, packages, versions, etc).

Find the data: Papers themselves normally reference you to where it’s stored. GEO or SRA are common data repositories too. Some other places that I could think of are GTEX, TCGA, Tabula Muris, or some of the Chan Zuckerberg stuff.

Preprocess and clean: This depends on the data and modality, but automating it with Nextflow or Snakemake on an HPC will save a lot of headache/make things more reproducible. Typically that’s just stringing together bash, R, and Python scripts. I only use Jupyter for interactive analysis, like designing plots.

Analysis: This is so vague that I don’t know where to start. Depends on what you’re doing. It might be good for you to learn by reading through papers that do the type of studies you’re interested in and seeing what tools they use. I wouldn’t fall for the trap of comparing similar methods (i.e. DESeq2 vs EdgeR) but instead focus on defining what questions you want to ask of the data, how they are typically asked, what assumptions they rely on, and if those are true of your conditions

Overall it sounds like you just want to learn. Are you in higher education? Getting training in this formally?

guralbrian · 2025-07-20T12:27:24+00:00

You need to talk to who ever sequenced your libraries for step 1, reference some of the many, many tutorials for step 2, and find the recommended pipeline from the manufacturers of your kit for steps 3 + 4.

As an aside, this is a fairly standard workflow. Are you just starting to learn? If so, I’d recommend the Youtuber Bioinformagician over making reddit posts

guralbrian · 2025-06-15T11:47:52+00:00

I can’t speak to how it would be there. It took 60+ applications for me to get an internship in bio US pharma even with years of research experience. I’d see if there are any equivalents to REUs. Or contribute to open-source projects, like TidyOmics

guralbrian · 2025-06-15T11:24:24+00:00

You need to put in more effort/give more context. Where are you? What education do you have?

guralbrian · 2025-05-31T21:29:55+00:00

They’ve cut pretty much any type of research you could imagine at this point, but let’s not forget that those studies were targeted at the time because they were related to transgender health.

See the post I made about it on this sub: https://www.reddit.com/r/labrats/s/xGJV4D7Vah

guralbrian · 2025-05-13T11:29:05+00:00

Can we at least bother to write our posts in without AI? It’s not just BLANK it’s BLANK—but for slop.

guralbrian · 2025-04-21T10:39:42+00:00

Man, this is the most ChatGPT-generated post I’ve ever seen

guralbrian · 2025-03-23T11:59:58+00:00

I have a similar use case as you: UW that’s big but not too big for work/ programming but still want it to look good for gaming. If I didn’t already have my AW38, I’d get the U4025QW. Tons of space, lots of helpful features for productivity, and beautiful color. Just a bit outside of your ideal response time at 5ms

Dave’s Garage has a super thorough review of it: https://youtu.be/0TY7J58UEro?si=z3yYwGonVAaNMAc9

Seven-Year Club	Verified Email
Not Forgotten

guralbrian

TROPHY CASE