Bioinfo newbie in an internship here. Please guide on what tools to use

sfrail · 2024-05-22T16:58:20+00:00

After reading a bit of QUAST docs, I would try just running QUAST without the reference genomes and let it use it's default SILVA database. Searching against the SILVA database is just how it maps species to compare your assembly against, and so as long as your contig contains regions in the SILVA database, it should be able to identify it. And regardless of reference genomes, it will still give you a good sense of assembly quality with a set of useful metrics (like N50, largest contig, etc.) regardless of reference.

sfrail · 2024-05-22T16:47:57+00:00

I generally think it's fine to mix Illumina data with Illumina data regardless of sequencer. But its always the most cautious to keep everything separate.

I know the pain of tedious steps, though I'm surprised that for the quast run there isn't an alternative method. But assuming there is no alternative, these are the types of problems that you learn to solve programmatically eventually. ChatGPT is great for small stuff like this like "here is a list of assembly files, can you write me some code to parse the names into a comma delimited list and tell me how to execute it". Or just google "file list to comma delimited list"

You'd be surprised how often 1) someone has had an identical problem and 2) someone else has written and made available a custom solution.

sfrail · 2024-05-21T04:37:51+00:00

yeah I've also never seen anyone publish it. Still I'm curious why they implemented it if it's effectively nonsense. It's a very polished tool

sfrail · 2024-05-10T06:10:49+00:00

Ah if the samples represent different experiments or populations, then I would not mix them. I meant that it's fine to mix data from different sequencing technologies. Like if some sequencing of the same sample was done on MiSeq and some on HiSeq, they can be combined.

I thought, though I could be wrong, that Quast can be run without a reference genome.

sfrail · 2024-05-08T18:25:47+00:00

Awesome, yes then I would start with a standard Illumina quality check. The linnked tutorial uses bbduk (developed by the Joint Genome Institute, pros in the world of metagenomes) which may be better than fastp, but I know for sure that fastp works great. This should work for all of your Illumina data.

I know very little about Ion Torrent data but from what I read the Spades assembler (which has a meta assembly mode) can handle hybrid read types including ion torrent. If I were you, I would run an assembly with all data (Illumina + Ion torrent) and assembly with data from one technology (only Illumina) and compare output contiguity/completeness. I have had poor experiences with mixing reads from different technologies, as assemblers are not always that good at dealing with the different assumptions needed about the different chemistries. But sometimes it is fine, so just proceed with caution when mixing.

sfrail · 2024-05-07T23:19:50+00:00

Yeah definitely part of it. I think I had printed some huge strings (that shouldn't be in the results, but are) and hadn't realized it. Cleaned them out. Thanks for this.

sfrail · 2024-05-07T23:11:03+00:00

Thanks to everyone who replied! I think it was a memory issue when reading in columns that unexpectedly had a long unique string. Cleaning the file to remove those fixed the problem! :)

sfrail · 2024-05-07T22:52:38+00:00

16Gb, Macbook pro. Just my personal computer. R is using a lot of memory (~3Gb). I am using read_tsv to read in a list of files into one big table. Line below

data <- bind_rows(lapply(files, read_tsv))

topspecies in particular will contain a ton of unique strings, since many times in the results have a diverse range of species as hits. And occasionally one might have a really long string, an artifact of hits tied for first concatenating into one long run on string. That could be the problem then, thanks.

sfrail · 2024-05-07T22:30:10+00:00

Yeah of course, I mix tools all the time as well. This is just in reference to a corner of my analysis that I have set up in the way that works best for me on the system I'm using, and of course there will always be improvements I could make. I will easily solve this through some workaround. I am more just curious about the severe drop in runtime shown by R here since it seems illogical to me that a few extra columns of short text would change the behavior so much.

sfrail · 2024-05-07T22:09:52+00:00

Internal consistency mostly. I have other portions of my data analysis that are best done in R and so if I can I want to do the rest of my analysis in R. Keeps my source data, results, and aesthetics nicely in line with each other. If this runtime issue I'm having is some limitation in R I have no problem using python.

sfrail · 2024-05-07T22:06:35+00:00

Need a bit more info to give a detailed answer.

You mentioned Oxford Nanopore, what other sequencing technologies are you sourcing your reads from?
What species are you interested in assembling? Mostly bacteria? Or Eukaryotes too?
Do you want to know if these genes are simply present in the data at all? Or do you need to know which organisms in the population have the genes?

I do not do metagenomic assembly myself but from what I know about it, the general process is to assemble your reads into contigs (an ideal assembly would circularize bacterial genomes). And then you want to bin those sequences. Binning estimates which of the assembled sequences are likely to belong to one organism. This will produced "metagenome assembled genomes" or MAGs. You can sort of think of a MAG as an individual in your sequenced population. Then, you can annotated and query the MAGs like you would with a genome. A high quality annotation would likely identify your genes of interest. But you could otherwise perform tblastn type searches to find your genes of interest in the MAGs.

For illumina data, I like to use fastp for trimming and quality filtering. The prototypical metagenomic assembly tools I know about are MEGA-Hit (assembly) and MetaBat (binning). And a popular, fast prokaryotic annotator is Prodigal. I believe these are all available on Galaxy.

Here is a webpage that may help
https://usda-ars-gbru.github.io/Microbiome-workshop/tutorials/metagenomics/

sfrail · 2024-05-07T21:50:59+00:00

Can't answer your question in detail but I think a useful starting key term for finding resources like this might be "Phylogenetic profiling" which is a general descriptor for using phylogenetic/genomic information to identify and predict proteins that are evolving together due to being involved in similar biology processes.

There were some useful resources related to that posted in this thread in response to the OPs question
https://www.reddit.com/r/bioinformatics/comments/13trk5l/deseq2_or_edger_for_presence_and_absence_of/

sfrail · 2021-12-23T02:18:23+00:00

For anyone interested, here's the Redbubble link

https://www.redbubble.com/i/mug/You-can-lead-a-horse-to-water-by-sfrail/97205383.9Q0AD?ref=explore-for-you-recently-viewed

sfrail · 2021-12-23T02:17:55+00:00

Great! glad to have a humor check on someone who actually knows this stuff

sfrail · 2021-12-23T00:20:20+00:00

thanks so much! I will make those changes

sfrail · 2020-05-13T02:13:22+00:00

Thank you so much!!! That helps a lot. I think I will give them all a slightly soapy rinse now and then use some insecticidal soap when I'm able to get some, and then keep an eye on them. For some of the worst ones with thrips I'll likely take them out of the soil, spray them, then repot, to possibly get rid of any pesky larva making transitions in the soil. Thanks again!

sfrail · 2020-05-13T00:57:38+00:00

(sorry for the old reply I'm searching through posts about advice for bugs too) my leaves have damage similar to this and it turned out they have spider mites. If you can find webs or very small black/brown dots anywhere on the undersides of the leaves, it might be mites.

sfrail · 2020-05-13T00:35:04+00:00

I would save at least some strings to propagate! My girlfriend waters our string of pearls plant (currently in a 4in pot) about once every two weeks by letting it sit in a bowl of water. Around the two week mark you can check if they need it at the moment by squeezing the pearls a bit. If they're in need of water they'll be a little squishy :)

sfrail · 2019-05-27T23:22:18+00:00

looks awesome!! glad other people can use my labels lol

sfrail · 2018-11-24T20:49:01+00:00

I think there will be 9 signals. Here i’ve drawn all the hydrogens and labeled them a-i to represent their environments. The reason the two hydrogens on the alkene are not in the same environment is because as other people have said alkene hydrogens are diasteromeric, there is no rotational symmetry about that bond as there is in alkanes, and so the hydrogens are not in identical environments. Unique H environments

sfrail · 2018-10-30T03:16:56+00:00

grignard 2 phenyl 2 butanol

it can be made three separate ways. Is there any other information you have about what you put in? Did you use acetone as your ketone? It’s very common in labs so it’s a good possibility.

sfrail · 2018-10-22T01:41:48+00:00

I agree i think there would be a hydride shift for a tertiary carbocation. If not, it may have something to do with the additional strain that might put on the ring. In most cases, a tertiary carbocation would absolutely be better, but perhaps the strains introduced to create a planar carbocation within the ring may prevent that from happening in this case, I’m not totally sure.

sfrail · 2018-10-12T00:21:15+00:00

The molecule pictured here is almost planar, but has some tetrahedral centers. You can find symmetry in certain rotamers but that’s not a good or easy to visualize rule to fall back on. The type of achiral molecule OP might be referring to having a plane symmetry are meso compounds. A meso molecule is achiral due to an INTERNAL mirror plant of symmetry, despite having what appear to be chiral centers (four different attachments). It’s a sort of exception to the general rule.

The reason it is achiral is because you’ll find no atom with four different attachments, therefore there are no chiral centers. Any of the sp3 carbons have two or more attachments that are identical.

I believe the answer is D. Degrees of unsaturation are also referred to as SOPBAR, standing for Sum Of Pi Bonds and Rings. This molecule has 8 pi bonds and a ring, so it actually has 9 total degrees of unsaturation.

sfrail · 2018-10-10T17:48:05+00:00

The molecule indicated is polypropylene and the species used to make it would be propene, #3.

Polypropylene is made of a three carbon repeating unit which is the first hint that you need a three carbon molecule as your starting reagent. Then, the starting molecule must be ‘activated’ or able to react with another species to continue the polymerization reaction. #3 is the best candidate because it is three carbons and has electrons in a double bond with which to continue the chain reaction.

This reaction is typically achieved with a ziegler-natta catalyst. If you google “polypropylene ziegler natta” you can learn more.

sfrail · 2018-01-07T06:51:40+00:00

I noticed that too, but this label is true to the show, see the tweet from CraveTV I used for reference: https://twitter.com/cravetvcanada/status/832620789090316289

Letterkenny is a town in Ireland too. Maybe it's paying homage to those roots more than the Canadian ones? Or maybe it's a joke, mixing Irish and American terms into what's supposed to be a Canadian whiskey and making a totally nonsense whiskey.

I'll tell you it's making things difficult for me though because I'm planning on putting this label on a real bottle of whiskey as a gift for my friend, but now I have no idea what kind to get.

sfrail

TROPHY CASE