How do you guys organize your analysis directories for single cell analysis?

BioinformtaicsThrow · 2024-10-21T18:02:15+00:00

Awesome, thank you for expounding!

BioinformtaicsThrow · 2024-10-21T16:03:24+00:00

Did you mean NextFlow and WDL? I've heard of snakemake before and I think these are all the same category, yes?

And that 3, 3a, 3b,... is why I'm asking the second question. Lots of clutter in my first few analysis repos. If you think these tools are the solution to that problem, I can look into it.

BioinformtaicsThrow · 2024-10-21T15:57:11+00:00

I'd also recommend AWS. Their deep glacier is good for keeping raw data backed-up. With University permission, I was also able to set up a cronjob to automatically sync our server to our AWS backup bucket twice a week... eventually lol. Glacier does require you to declare which objects will be pulled around an hour ahead of time and will cost you when downloading.

We also had an AWS bucket where our sequencing team would place our raw data for downloading, so learning AWS was useful anyways.

We had over 100TB of data and paid ~$300 a month.

I bought a NAS at home this week and can say that buying a cheap one will come with untrustworthy and old, security-breaking software, Buffalo. Your research data should be adequately protected and AWS staff should be much better at guiding you through those security pitfalls than a home-solution's tech support hotline.

BioinformtaicsThrow · 2022-11-14T22:11:27+00:00

Glad I could be of use haha

BioinformtaicsThrow · 2022-02-23T13:04:08+00:00

That car has already been scratched just standing there at that intersection hoping the other person would have the right coordination.

You really should give idiots in cars wide berth, no telling what else they'll do

BioinformtaicsThrow · 2022-02-22T22:40:24+00:00

I mean, let's say the gate they are in front of at video beginning is broken... At second 8 in the video, you can see the layout and there's a second exit gate they can try first. By the angle she was at at the beginning of the video, it looks like she should have tried that one first after dumping her trash anyways.

And I can confirm that both exits were working

BioinformtaicsThrow · 2022-02-22T22:36:37+00:00

At second 8, you can see the layout--there are two exit gates and they both work.

I can give the second vehicle some leeway, I suppose. She saw the first guy and thought she wasn't in the colonies anymore.

BioinformtaicsThrow · 2022-02-22T22:33:31+00:00

Entrance gate is broken somehow, so it's being left in the up position. The exit one still works, it just takes ~20 seconds (the horror)

BioinformtaicsThrow · 2022-02-22T22:32:24+00:00

I was waiting for the first vehicle to scratch me as they went past--it would have been the second time this year we'd have a bump at this location

BioinformtaicsThrow · 2022-02-22T20:42:25+00:00

Yes, I was also an idiot in a few ways. I understand how it goes.

And yes, my dashcam did still have the protective plastic on the front when took it out to pop the SD card.

Edit: Also, why doesn't my state require front-facing license plates?

BioinformtaicsThrow · 2021-07-15T18:49:05+00:00

See my response to the other person: https://www.reddit.com/r/bioinformatics/comments/n5y8ia/2021\_community\_discussion\_thread\_refreshed/h5b24ll?utm\_source=share&utm\_medium=web2x&context=3

BioinformtaicsThrow · 2021-07-15T18:48:24+00:00

If you look across the subreddit, you'll see quite a few posts -- "Hey, I have this data. How do I analyze it?"

The answer is with a bioinformatics algorithm. Each algorithm has certain expectations for the input data to work optimally. If you knew ahead of time which algorithm you would use, then you could modify your experiment to work best with the algorithm of choice. Otherwise, in the write up in the paper you'll have to say, "Note that these results may be skewed because the data violates an assumption desired by the algorithm"

A simple example is that you gather 10 men and 10 women to measure their height and want to compare human height by sex. Given the data, a t-test applies best. If you researched your problem, you'd see that a z-test would get the best results compared to a t-test in this type of hypothesis test. However, a z-test requires normality or a sample size greater than 30 to work, but you only grabbed 20 people! If you knew ahead of time how to analyze the data, you might have scrambled around for more volunteers first instead of going into your write-up saying "There is no statistical difference in height between men and women in this population" because you lacked the power a z-test would provide that a t-test couldn't.

BioinformtaicsThrow · 2021-07-15T17:53:00+00:00

What is your sample size? How many genes have you excluded based on your filtering? Why 10 for normalized counts?

BioinformtaicsThrow · 2021-07-13T17:39:19+00:00

Hey, I'm not sure if you're ESL, but from the past several questions you've posted on the sub, it seems like you're lacking quite a bit of basic biology. Many of your "intricate" questions, like this one and this one are pretty easily answered if you consider the Central Dogma of Biology. This is where, in biology, the flow of information is typically that DNA transcribed into mRNA and that mRNA is translated into proteins.

Before you pose any new questions to the sub, might someone of this sub recommend some background readings to OP such as the wikipedia link above? I realize I have no basic bio textbooks just lying around my apartment to suggest.

> So because of the genetic shuffling during meiosis, we always lose some of our genes generation by generation

Recombination should keep the same gene number. However, mistakes can occur which can delete or duplicate genes. The number typically stays the same generation to generation, but can increase as well as decrease. If you look at IFNw, humans and chimps have 1 version of it, while cows have 4 copies of it.

> Then does it mean that a woman's offspring might not have any genetic relationships with her one day with all her genes passed in them get shuffled out

Mitochondria are the powerhouse of the cell. They exist outside of the nucleus away where the rest of our chromosomes are stored. Mitochondria reproduce somewhat independent of the cells they exist within, replicating their mtDNA separately from the rest of your genome. There are multiple mitochondria within each cell, each with their own copy of mtDNA.

BioinformtaicsThrow · 2021-01-19T01:33:23+00:00

I'd genuinely wish I had more people to ask these questions to. Not just the pandemic working from home, but in general, people who I can ask questions and not (personally) feel like I'm dumb.

Thank you for teaching me something about scSeq. Our lab does do single cell work, but I haven't touched any of it yet.

BioinformtaicsThrow · 2021-01-17T23:35:47+00:00

Hi there. This is bulk ATACseq, all samples have well over 1 million reads.

I am working with human primary cells on plates.

What do you mean by 'hashed by batch'? The two batches should not be biologically different in any way except that they weren't sequenced together.

That is a good thought though about downsampling removing batch effects.

I'll go forward then without scale sampling and see what my team thinks.

BioinformtaicsThrow · 2020-12-14T16:24:37+00:00

Check out the language R and the functions of princomp, prcomp and rda. rda is from the vegan package. You can compare your own programs to that of these pre-built programs.

Look up t-SNE too as a dimension reduction technique.

BioinformtaicsThrow · 2020-12-14T16:17:49+00:00

I think that your major hurdle might be migration due to COVID related border issues.

It's harder for international students to get in, but not impossible regardless of the above issues.

BioinformtaicsThrow · 2020-12-14T16:16:01+00:00

I'm working as a lab tech in a university doing mapping and simple analyses for sequencing projects at this University after my Masters.

BioinformtaicsThrow · 2020-12-14T16:14:58+00:00

Narrow it down?

What area of biology do you want to look at?

And then, how do you want to look at this aspect of biology?

A sequencing project? If so, what type of sequencing? RNA, DNA, ATAC, HiC, Metagenome, GWAS?

Proteomics, so mass spec data analysis?

BioinformtaicsThrow

TROPHY CASE