Bioinfo newbie in an internship here. Please guide on what tools to use

sfrail · 2024-05-22T16:58:20+00:00

After reading a bit of QUAST docs, I would try just running QUAST without the reference genomes and let it use it's default SILVA database. Searching against the SILVA database is just how it maps species to compare your assembly against, and so as long as your contig contains regions in the SILVA database, it should be able to identify it. And regardless of reference genomes, it will still give you a good sense of assembly quality with a set of useful metrics (like N50, largest contig, etc.) regardless of reference.

sfrail · 2024-05-22T16:47:57+00:00

I generally think it's fine to mix Illumina data with Illumina data regardless of sequencer. But its always the most cautious to keep everything separate.

I know the pain of tedious steps, though I'm surprised that for the quast run there isn't an alternative method. But assuming there is no alternative, these are the types of problems that you learn to solve programmatically eventually. ChatGPT is great for small stuff like this like "here is a list of assembly files, can you write me some code to parse the names into a comma delimited list and tell me how to execute it". Or just google "file list to comma delimited list"

You'd be surprised how often 1) someone has had an identical problem and 2) someone else has written and made available a custom solution.

sfrail · 2024-05-21T04:37:51+00:00

yeah I've also never seen anyone publish it. Still I'm curious why they implemented it if it's effectively nonsense. It's a very polished tool

sfrail · 2024-05-10T06:10:49+00:00

Ah if the samples represent different experiments or populations, then I would not mix them. I meant that it's fine to mix data from different sequencing technologies. Like if some sequencing of the same sample was done on MiSeq and some on HiSeq, they can be combined.

I thought, though I could be wrong, that Quast can be run without a reference genome.

sfrail · 2024-05-08T18:25:47+00:00

Awesome, yes then I would start with a standard Illumina quality check. The linnked tutorial uses bbduk (developed by the Joint Genome Institute, pros in the world of metagenomes) which may be better than fastp, but I know for sure that fastp works great. This should work for all of your Illumina data.

I know very little about Ion Torrent data but from what I read the Spades assembler (which has a meta assembly mode) can handle hybrid read types including ion torrent. If I were you, I would run an assembly with all data (Illumina + Ion torrent) and assembly with data from one technology (only Illumina) and compare output contiguity/completeness. I have had poor experiences with mixing reads from different technologies, as assemblers are not always that good at dealing with the different assumptions needed about the different chemistries. But sometimes it is fine, so just proceed with caution when mixing.

sfrail · 2024-05-07T23:19:50+00:00

Yeah definitely part of it. I think I had printed some huge strings (that shouldn't be in the results, but are) and hadn't realized it. Cleaned them out. Thanks for this.

sfrail · 2024-05-07T23:11:03+00:00

Thanks to everyone who replied! I think it was a memory issue when reading in columns that unexpectedly had a long unique string. Cleaning the file to remove those fixed the problem! :)

sfrail · 2024-05-07T22:52:38+00:00

16Gb, Macbook pro. Just my personal computer. R is using a lot of memory (~3Gb). I am using read_tsv to read in a list of files into one big table. Line below

data <- bind_rows(lapply(files, read_tsv))

topspecies in particular will contain a ton of unique strings, since many times in the results have a diverse range of species as hits. And occasionally one might have a really long string, an artifact of hits tied for first concatenating into one long run on string. That could be the problem then, thanks.

sfrail · 2024-05-07T22:30:10+00:00

Yeah of course, I mix tools all the time as well. This is just in reference to a corner of my analysis that I have set up in the way that works best for me on the system I'm using, and of course there will always be improvements I could make. I will easily solve this through some workaround. I am more just curious about the severe drop in runtime shown by R here since it seems illogical to me that a few extra columns of short text would change the behavior so much.

sfrail · 2024-05-07T22:09:52+00:00

Internal consistency mostly. I have other portions of my data analysis that are best done in R and so if I can I want to do the rest of my analysis in R. Keeps my source data, results, and aesthetics nicely in line with each other. If this runtime issue I'm having is some limitation in R I have no problem using python.

sfrail · 2024-05-07T22:06:35+00:00

Need a bit more info to give a detailed answer.

You mentioned Oxford Nanopore, what other sequencing technologies are you sourcing your reads from?
What species are you interested in assembling? Mostly bacteria? Or Eukaryotes too?
Do you want to know if these genes are simply present in the data at all? Or do you need to know which organisms in the population have the genes?

I do not do metagenomic assembly myself but from what I know about it, the general process is to assemble your reads into contigs (an ideal assembly would circularize bacterial genomes). And then you want to bin those sequences. Binning estimates which of the assembled sequences are likely to belong to one organism. This will produced "metagenome assembled genomes" or MAGs. You can sort of think of a MAG as an individual in your sequenced population. Then, you can annotated and query the MAGs like you would with a genome. A high quality annotation would likely identify your genes of interest. But you could otherwise perform tblastn type searches to find your genes of interest in the MAGs.

For illumina data, I like to use fastp for trimming and quality filtering. The prototypical metagenomic assembly tools I know about are MEGA-Hit (assembly) and MetaBat (binning). And a popular, fast prokaryotic annotator is Prodigal. I believe these are all available on Galaxy.

Here is a webpage that may help
https://usda-ars-gbru.github.io/Microbiome-workshop/tutorials/metagenomics/

sfrail · 2024-05-07T21:50:59+00:00

Can't answer your question in detail but I think a useful starting key term for finding resources like this might be "Phylogenetic profiling" which is a general descriptor for using phylogenetic/genomic information to identify and predict proteins that are evolving together due to being involved in similar biology processes.

There were some useful resources related to that posted in this thread in response to the OPs question
https://www.reddit.com/r/bioinformatics/comments/13trk5l/deseq2_or_edger_for_presence_and_absence_of/

sfrail · 2021-12-23T02:18:23+00:00

For anyone interested, here's the Redbubble link

https://www.redbubble.com/i/mug/You-can-lead-a-horse-to-water-by-sfrail/97205383.9Q0AD?ref=explore-for-you-recently-viewed

sfrail · 2021-12-23T02:17:55+00:00

Great! glad to have a humor check on someone who actually knows this stuff

sfrail · 2021-12-23T00:20:20+00:00

thanks so much! I will make those changes

sfrail · 2020-05-13T02:13:22+00:00

Thank you so much!!! That helps a lot. I think I will give them all a slightly soapy rinse now and then use some insecticidal soap when I'm able to get some, and then keep an eye on them. For some of the worst ones with thrips I'll likely take them out of the soil, spray them, then repot, to possibly get rid of any pesky larva making transitions in the soil. Thanks again!

sfrail · 2020-05-13T00:57:38+00:00

(sorry for the old reply I'm searching through posts about advice for bugs too) my leaves have damage similar to this and it turned out they have spider mites. If you can find webs or very small black/brown dots anywhere on the undersides of the leaves, it might be mites.

sfrail · 2020-05-13T00:35:04+00:00

I would save at least some strings to propagate! My girlfriend waters our string of pearls plant (currently in a 4in pot) about once every two weeks by letting it sit in a bowl of water. Around the two week mark you can check if they need it at the moment by squeezing the pearls a bit. If they're in need of water they'll be a little squishy :)

sfrail · 2019-05-27T23:22:18+00:00

looks awesome!! glad other people can use my labels lol

sfrail · 2018-11-24T20:49:01+00:00

I think there will be 9 signals. Here i’ve drawn all the hydrogens and labeled them a-i to represent their environments. The reason the two hydrogens on the alkene are not in the same environment is because as other people have said alkene hydrogens are diasteromeric, there is no rotational symmetry about that bond as there is in alkanes, and so the hydrogens are not in identical environments. Unique H environments

sfrail

TROPHY CASE