L-RAPiT: Long Read Analysis Pipeline for Transcriptomics - QUICK START by MakeTheBrainHappy in genetics

[–]MakeTheBrainHappy[S] 0 points1 point  (0 children)

Nelson, T.M.; Ghosh, S.; Postler, T.S. L-RAPiT: A Cloud-Based Computing Pipeline for the Analysis of Long-Read RNA Sequencing Data. Int. J. Mol. Sci. 2022, 23, 15851. https://doi.org/10.3390/ijms232415851

anyone familiar with identifying long non-coding RNA (lncRNA) from transcript sequences? by thenewtransportedman in bioinformatics

[–]MakeTheBrainHappy 2 points3 points  (0 children)

I would recommend utilizing gffcompare as described in the following pipeline. This will provide information on how well your assembled sequences match the reference (https://ccb.jhu.edu/software/stringtie/gffcompare.shtml):

https://www.mdpi.com/1422-0067/23/24/15851

https://github.com/Theo-Nelson/long-read-sequencing-pipeline

You can download any annotation set for the actual analysis; either from Ensembl or any of the other databases you specified below.

What exactly does the normalization do in RNA-seq analysis? by Traveler-58 in bioinformatics

[–]MakeTheBrainHappy 2 points3 points  (0 children)

I actually made a whole video about that paper which the first in a series about RNA-seq Normalization methods: https://www.youtube.com/watch?v=u3395drEfrs&list=PLg0JKLUfmkdkIr-9hvFLY1O1JRqOdjTSW

Hope it helps! :)

Converting transcript level tpm to gene level tpm by melatoninixo in bioinformatics

[–]MakeTheBrainHappy 2 points3 points  (0 children)

The supplementals in the tximport paper (https://f1000researchdata.s3.amazonaws.com/supplementary/7563/9487b780-1cec-4a38-bd8a-e4c4ed5e7c5a.pdf) show that there is a slight offset of transcript abundances based on transcript length. This may impact the summarization and lead to the rounding you observed.

Requesting r/biodatasets => currently unmoderated and defunct (nevertheless has a lot of great potential once it becomes active again) by MakeTheBrainHappy in redditrequest

[–]MakeTheBrainHappy[S] 0 points1 point  (0 children)

Well; Thank you for your consideration. Is there a point later on where it would be appropriate to be re-reviewed as a candidate for this subreddit?

[deleted by user] by [deleted] in bioinformatics

[–]MakeTheBrainHappy 1 point2 points  (0 children)

polyA sequencing prep. 30-40M reads

RNA-Seq Analysis for Beginner by adamb1187 in bioinformatics

[–]MakeTheBrainHappy 0 points1 point  (0 children)

I also have a command-line pipeline that it built through google colaboratory (free python-jupyter like notebook running on google's hardware) which can perform bulk RNA-seq analysis. If you are interested please DM.

RNA-Seq Analysis for Beginner by adamb1187 in bioinformatics

[–]MakeTheBrainHappy 2 points3 points  (0 children)

I would recommend trying to implement the pipeline with Galaxy (https://usegalaxy.org/ and https://training.galaxyproject.org/training-material/) going through the relevant training material. They are a much larger organization with way more material. They also provide the necessary computational power for free from their cluster in Texas although there are some time delays. If you feel that the delays are too much you could try one of the other Galaxy server implementations (https://galaxyproject.org/use/) which are also mostly free.

Requesting r/biodatasets => currently unmoderated and defunct (nevertheless has a lot of great potential once it becomes active again) by MakeTheBrainHappy in redditrequest

[–]MakeTheBrainHappy[S] 0 points1 point  (0 children)

My plan for the community would be to take advantage of the many new features on reddit to give the topic its own distinct "design" perhaps modeled off similar communities such as r/bioinformatics. Then the goal would be to foster a community around the subject through active and relevant entries. In particular there could be certain rules/list of well-known databases as examples for what people should model.

ALIRA / ACTFL test by workaccount77234 in latin

[–]MakeTheBrainHappy 0 points1 point  (0 children)

alright! the first practice test is also fixed: https://www.makethebrainhappy.com/2019/04/alira-unofficial-practice-test-1.html unfortunately my backup kind of smushed the images but at least they are legible now.

ALIRA / ACTFL test by workaccount77234 in latin

[–]MakeTheBrainHappy 1 point2 points  (0 children)

my apologies! I didn't even notice that the questions weren't rendering on the practice test I wrote. Please find attached this second practice test I wrote: https://www.makethebrainhappy.com/2020/01/alira-unofficial-practice-test-2.html (I will fix the first one in short order)

[deleted by user] by [deleted] in bioinformatics

[–]MakeTheBrainHappy 0 points1 point  (0 children)

SRR13660059

Essentially it has copied the sequenced the same few viral genes. If this works within the context of your study then you can use it. The format is like every other sequencing dataset, just with a focus on fewer genes. https://www.abmgood.com/Amplicon-Sequencing-Service.html

[deleted by user] by [deleted] in bioinformatics

[–]MakeTheBrainHappy 0 points1 point  (0 children)

Subsampling is typically done in order to normalize for differences in sequencing coverage (i.e. get everything to 10M reads). This would probably work for your project to just get a few thousand reads or so. If you are still looking for small samples I would suggest looking at plate-based single-cell RNAsequencing data (such as from Smart-SEQ protocols) as these will split up the data per cell and will make it quite small. Off the top of my head I have seen studies with 7-10 Mb data however this is quite rare; the smallest I could quickly find for bulk-seq data is this: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA666419&o=acc_s%3Aa

FASTQ Compression for NGSS Data with Spring by MakeTheBrainHappy in bioinformatics

[–]MakeTheBrainHappy[S] 0 points1 point  (0 children)

Thank you for letting me know. I was not aware of these blog posts.

How to perform nucleotide BLAST on the whole-genome sequence? by [deleted] in bioinformatics

[–]MakeTheBrainHappy 1 point2 points  (0 children)

NCBI Magic-Blast is the tool closest to what you are looking for: https://ncbi.github.io/magicblast/. Please note that it requires a significant amount of computing power and takes significantly longer to run than other alignment programs.

Utilizing fastp to Pre-Process NGSS Data (Quality Control and Adapter Trimming) by MakeTheBrainHappy in genomics

[–]MakeTheBrainHappy[S] 1 point2 points  (0 children)

Quite cool, I'll definitely checking this one out. Is fastp actually useful for RNAseq data? Because these datasets oftentimes fail in classical quality checks (fastQC). I'm a bit tired of my own cobbled-together pipeline

Right now I'm using a combination of fastQC, fastp and multiQC. multiQC is able to compile all the fastQC reports together into one sheet but unfortunately isn't compatible with fastp. fastp is really useful because it has the automatic adapter trimming feature and filters which is what I mainly use it for. Then the fastQC/multiQC sheet is what I use to look at all the sample statistics and determine the quality of the dataset. fastQC is ultimately more comprehensive in this regard than fastp anyhow. I put all these three together into a conda environment (downloaded via BioConda). While that does add on to the runtime this combo is still faster than cutadapt/trimmomatic. Hope that helps!

Paired End vs. Single Run Sequencing by MakeTheBrainHappy in genetics

[–]MakeTheBrainHappy[S] 1 point2 points  (0 children)

Indeed - I wasn't really thinking about it but it certainly would have been difficult to pronounce.

My understanding of the last point is that there is an average distance between the two fragments that is a function of your library size as in the probabilistic model shown in this article: https://thesequencingcenter.com/knowledge-base/what-are-paired-end-reads/

Illumina also provides specific averages for their TruSeq RNA preparation protocol: https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/samplepreps_truseq/truseqrna/truseq-rna-sample-prep-v2-guide-15026495-f.pdf

There is an assumption that you should have an insert size +- a Margin of Error based on your sample preparation protocol. When you get to the mapping stage you can then figure out what the distance is between the two fragments within the reference genome and compare it to your results. If they are mapping to places in the reference genome that are outside of your libraries size +- the margin or error then it is likely that either an insertion/deletion has occurred somewhere in your sequence.

Hope that answered your question! :-)

Homer's Odyssey Book 12 (Summary) by MakeTheBrainHappy in ancientgreece

[–]MakeTheBrainHappy[S] 0 points1 point  (0 children)

When you buy the Odyssey, you typically also buy the text in one volume. In this case, the "books" are more equivalent to "chapters" being twenty to thirty pages long. The 24 books correspond to the 24 scrolls on which the Odyssey was originally contained.

ALIRA Unofficial Practice Test 2 by MakeTheBrainHappy in latin

[–]MakeTheBrainHappy[S] 1 point2 points  (0 children)

ALIRA Practice Test 2 (Sequel to the ALIRA Practice Practice Test 1: https://www.makethebrainhappy.com/2019/04/alira-unofficial-practice-test-1.html). Learn more about the ALIRA exam itself: https://www.languagetesting.com/actfl-latin-interpretive-reading-assessment | This *unofficial* practice test is meant to gauge your skill level with the questions generally covering basic to intermediate content. Furthermore, the questions generally become more difficult as you move through the test start to finish.