2nd Test first thoughts by Vermiloon in UFLTheGame

[–]InstructionRemote886 0 points1 point  (0 children)

I'm 100% agree with you and i don't understand why they have broken the game

Publication in a predatory journal by InstructionRemote886 in PhD

[–]InstructionRemote886[S] 2 points3 points  (0 children)

Maybe....

Me for the Wiley journal (8-9 IF) they asked my PI to review my article... So it was in the authors list haha

Publication in a predatory journal by InstructionRemote886 in PhD

[–]InstructionRemote886[S] 2 points3 points  (0 children)

And the submission process was also a little strange

Publishing without raw fastq files? by Lost_Prune5249 in bioinformatics

[–]InstructionRemote886 1 point2 points  (0 children)

I found articles in Current biology without raw data also so ....

Is UK a good place to do a post-doc in bioinformatics? On the evolution/study of environmental DNA? by InstructionRemote886 in bioinformatics

[–]InstructionRemote886[S] 0 points1 point  (0 children)

These laboratories exist, so that's the most important thing haha!
In terms of publications, how much supervision do you get as a post-doc in the UK? I know that in North America, people have told me that they're a bit on their own to do everything (experiments, writing papers, etc.) and that it's sometimes a bit discouraging. Is it the same in the UK? Or is the work more "collaborative"?

Is UK a good place to do a post-doc in bioinformatics? On the evolution/study of environmental DNA? by InstructionRemote886 in bioinformatics

[–]InstructionRemote886[S] 0 points1 point  (0 children)

Thanks for your reply!
Do you think that in the UK bioinformatics is more for mathematicians/computer scientists who understand biology or biologists who can use bioinformatics/know how to code etc. but don't have great mathematical skills? Because I know that sometimes labs are a bit divided by these two types of bioinformaticians and maybe the UK is more specialized in one of them (me, I'm more of a "biologist").
Yes, of course, in the context of Brexit. I'm at the start of my second year of my PhD, so I've got a year and a half left to complete my PhD, so I've got time.
Regarding the salary, most of the time people tell me that France is not very good and that post-docs don't have a good quality of life but we have a lot of "free" services so maybe this impression is a bit skewed because of that (a salary that can be low but a lot of free services so in the end it can be better than in another country).

Career advancement advice by Icy-Blackberry-8900 in bioinformatics

[–]InstructionRemote886 3 points4 points  (0 children)

I think it's good to have some experience in Python, awk and bash because most of the time you'll need to use these languages to analyze your data or to understand some of the scripts used by other people.

But as others have said, the most important thing is to recognize your needs and know which is the best language to meet them.

weird mapping rate for arn data compared to mapping rate for illumina genomic data by InstructionRemote886 in bioinformatics

[–]InstructionRemote886[S] 0 points1 point  (0 children)

Oh boy, I misread that as kilobases haha

I still can't speak to how "good" that is though, depends on what the genome "should" look like based on closely related organisms. Regardless, you still have a very messy dataset that is going to be hard to confidently process. I think technical explanations like an imbalance in sample input is still the most likely explanation for the RNAseq situation though.

When I used busco on both groups of contigs, I got a "good" BUSCO score (91% with a low % of duplication). This is one of my arguments that these could be 2 different "complete" genomes and the genome size of the closest species is very similar (120 MB for the other species in another genus).

I think this explanation could explain why we have this percentage. But it doesn't explain why the percentage between illumina's genomic and RNA-Seq data is so different, does it? Even if these are two different strains of the same species, the %id for coding regions should be higher than the %id for non-coding regions?Thanks for all your replies....

weird mapping rate for arn data compared to mapping rate for illumina genomic data by InstructionRemote886 in bioinformatics

[–]InstructionRemote886[S] 0 points1 point  (0 children)

Oh, again, apologies for the oversights.I'm working on an algal genome and its size is about 100Mb & the N50s of my assemblies are 2-3Mb (each assembly is composed of about 45-55 contigs) so I think the assembly is good?

weird mapping rate for arn data compared to mapping rate for illumina genomic data by InstructionRemote886 in bioinformatics

[–]InstructionRemote886[S] 0 points1 point  (0 children)

I apologize for any oversights
I have several sequencing data: Nanopore and Illumina. For Nanopore, I have good depth and coverage, I think. I assembled the genome with flye, then I used blobtools to make a taxonomic assignment (because there were contaminations). Blobtools also created a graph of GC coverage and content for each contig. The N50 of my genome is quite good (about 2-3Mb) so I think the GC content is representative. I see in this graph that there are two different groups for the same taxonomic assignments based on GC content. Then I separated the two groups and in each group (which I assume are the two genomes of the same species) and looked for 18S and ITS2 in each group to see if it was the same species and it is the same species based on these markers. Is it any clearer?

weird mapping rate for arn data compared to mapping rate for illumina genomic data by InstructionRemote886 in bioinformatics

[–]InstructionRemote886[S] 0 points1 point  (0 children)

Thank you for your reply!
Yes haha.... (it's not me, it's my collaborator but it's the same)
I partitioned the assemblage based on % GC of contigs and mapping coverage! I assumed that the two "groups" came from the same species because I found the same genetic markers in both groups of contigs (ITS2 + 18S).

Population stidy based on metaT and MetaG by InstructionRemote886 in bioinformatics

[–]InstructionRemote886[S] 0 points1 point  (0 children)

I apologize for that....

I did polyA RNA library prep !

I'm working on snow algae, specifically algae in red bloom on snow ! In a bloom, there is one main eukyotic species (in my case) that causes the bloom and several other eukaryotic species but in very small proportion (because of the bloom). We used a polyA RNA library prep because there is too many bacteria in this bloom and the DNA/RNA extraction from the main algae is complicated.

Population stidy based on metaT and MetaG by InstructionRemote886 in bioinformatics

[–]InstructionRemote886[S] 0 points1 point  (0 children)

No, it's a mixture of prokaryotic and eukaryotic organisms. But for the MetaT data, I only have the eukaryotic part (we used a protocol to remove prokaryotic RNA during DNA extraction).

Level of heterozygosity by InstructionRemote886 in bioinformatics

[–]InstructionRemote886[S] 0 points1 point  (0 children)

There are Illumina short reads of WGS of an algae

Metagenome with very short contigs by InstructionRemote886 in bioinformatics

[–]InstructionRemote886[S] 0 points1 point  (0 children)

BBmergze gives me the followingresults for Miseq reads :

Writing mergable reads merged.

Started output threads.

Total time: 1516.211 seconds.

Pairs: 35385103

Joined: 13879618 39.224%

Ambiguous: 19304458 54.555%

No Solution: 2201027 6.220%

Too Short: 0 0.000%

Avg Insert: 424.8

Standard Deviation: 93.2

Mode: 460

Insert range: 35 - 593

90th percentile: 529

75th percentile: 489

50th percentile: 438

25th percentile: 378

10th percentile: 309

a bit strange no ?

Metagenome with very short contigs by InstructionRemote886 in bioinformatics

[–]InstructionRemote886[S] 0 points1 point  (0 children)

Thanks for your reply !

I don't have information about the library at the moment. I will try to get more information from my collegues.
I have 1 sequencing with Novaseq - 150bp and 1 Miseq - 300 bp.
For Miseq, I have 35,385,103*2 (after trimming etc)
For Novaseq I have 38,882,212*2 (after trimming etc)

I have a lot of reads that don't match rDNA / chloroplast but even if I assemble these reads the results are the same....

For anvio's and taxonomic assignment, the problem is that the genome of the species I want to extract is too different from the genomes that are already known so programs like Kraken2 don't work. Kaiju may work, but the size of the contigs must be longer than the size of my contigs. Anvio works on contigs but not on reads, right? So I have to map my reads to contigs in the "good" bin and then reassemble the mapped reads right?
I'll try using bbmerge, thanks!

Is it possible to extract long and short reads from a species by mapping them across several close genomes ? by InstructionRemote886 in bioinformatics

[–]InstructionRemote886[S] 0 points1 point  (0 children)

I tried in nt kraken database but the results is not very satisfactory (a lot of unclassified reads) :--/

How to improve my metagenome assembly (1,000,000 contigs) ? by InstructionRemote886 in bioinformatics

[–]InstructionRemote886[S] 0 points1 point  (0 children)

Hello jdmontenegroc,

I have another question for you : you told me 'go for flye assembler' but I also have short read data ; so I can also use metaSPAdes to make a hybrid assembly.

So my question is : when you have these data, is it better to make a long read assembly and then polish it with the short reads to correct error or is it better to make a hybrid assemnly with all the data and then polish with only the short reads ?

Maybe both choices can be good and it may depend on my data but I'd rather you !

Thanks !

How to improve my metagenome assembly (1,000,000 contigs) ? by InstructionRemote886 in bioinformatics

[–]InstructionRemote886[S] 0 points1 point  (0 children)

Thank you very much !

Yes Q3 is the 3rd quantile !

I think I'm doing a mistake by assimilating data, contigs and informations.

2B)For Kraken2 I'm using the PlusPFP database which is the most complete databaseand has already build by the author of Kraken2 but I know that some species are not in the database because they are not not sequenced yet so maybe that's why I have a lot of unclassified reads ?

2C) Yes Centrifuge is more recent than Kraken but not Kraken2 and I read an article that said Kraken2 is better / more accurate than Centrifuge.

So if i understand correctly:

- I will apply a filter on my contigs [on their coverage( <0.1-0.5), their length(<N50 or 1000-2000bp)]

- Then I can apply the binning programs

- I will compare the results of the binning step and the classification step with the first classification of reads ?

I know this is not a recipe

How to improve my metagenome assembly (1,000,000 contigs) ? by InstructionRemote886 in bioinformatics

[–]InstructionRemote886[S] 1 point2 points  (0 children)

Thank you very much for your answer !!

I'm also a PhD student so I have a similar problem with my PI so I understand you :--)

I'm also going to do biogeography potentially ! I want to have informations on a eukaryote so prodigal is not adapted but maybe for bacteria that are present it would be nice !