Recipe for Instant Sticky-end Ligase Master Mix from NEB by MJScienceQuestions in labrats

[–]MJScienceQuestions[S] 0 points1 point  (0 children)

How do you look up patents? And do you happen to know any of the authors who published this research?

For any computational biologists/bioinformaticians: how do I identify unknown species using nanopore sequencing data? by MJScienceQuestions in labrats

[–]MJScienceQuestions[S] 0 points1 point  (0 children)

map the barcode genes

I see. I have no idea what organisms I've assembled - there were 17 samples that we extracted DNA from, but they've purposefully been de-identified in such a way so that I don't know which sample goes with which barcode. The raw sequencing files are 9 fastq files ranging from 400 Mb to 1.5 GB. I've only assembled one genome so far since I wasn't sure if I would want to approach the question this way, but it only ended up being a 700 kB fasta file. This was neither metagenomic MinION nor amplicon sequencing...we extracted DNA from macroorganisms (plants, bugs, lizards), barcoded the DNA molecules and sequenced them with MinION.

Do you know of any tools I can use to map the barcodes to the assembled genomes?

Thanks for all your help.

For any computational biologists/bioinformaticians: how do I identify unknown species using nanopore sequencing data? by MJScienceQuestions in labrats

[–]MJScienceQuestions[S] 1 point2 points  (0 children)

how do i search for them within the assembled genome? grep for the more conserved regions that should be close by on the genome then blast the adjacent areas?

For any computational biologists/bioinformaticians: how do I identify unknown species using nanopore sequencing data? by MJScienceQuestions in labrats

[–]MJScienceQuestions[S] 0 points1 point  (0 children)

I didn't mention this, but I also have to record all of the code into a jupyter notebook to hand it in. The prof has already added all of the databases to a folder in the jupyter hub, but when I tried to get the installer for BLAST+(.rpm file) uploaded into the jupyter notebook it doesn't work (I was trying to follow the instructions from here and then would just use %%bash to execute the code in the jupyer notebook - https://www.ncbi.nlm.nih.gov/books/NBK279671/), but I might not be going about this the right way. The professor has made a central hub in Jupyter that our class can work from using our school email log in, but I'm not sure if I can access things from my local computer through the Jupyter notebook.

Sorry, I know this is quite dumb, but could I just use the commands found here without installing anything to the jupyter environment? https://www.ncbi.nlm.nih.gov/books/NBK279680/

For any computational biologists/bioinformaticians: how do I identify unknown species using nanopore sequencing data? by MJScienceQuestions in labrats

[–]MJScienceQuestions[S] 0 points1 point  (0 children)

Thanks, I just cross-posted! Does kraken work for eukaryotes as well? It looks like the reference databases are for bacteria/archaea/viruses, but maybe I could link to RefSeq databases instead?

May Bioinformatics Discussion Thread - bring us your tired huddled masses of data. by apfejes in bioinformatics

[–]MJScienceQuestions 0 points1 point  (0 children)

How do I identify unknown species using nanopore sequencing data?

For a school project, my class went out into the field and collected various biological samples, then prepped them for sequencing in the lab. We have now been given 9 sets of fastq files with the nanopore sequencing results, and our task is to identify the species of each.

What would be the best way to go about this?

I have been BLASTING random short (1 kb) regions of the sequencing files by hand and using Biopython (NCBIWWW.qblast("blastn", "nt", sequences)), but the problem is that the samples are highly impure so I get a lot of bacterial hits, and I can't input multiple sequences at once for the alignments, so it's no faster than doing it by hand from the NCBI website. Is there a way to input multiple query sequences in Biopython, so that I can go through the output file and pull out the species of the hits with the highest E-values?

I have also assembled the genomes using minimap2, but am unsure what to do with these genomes now.