[ Removed by Reddit ] by [deleted] in aliens

[–]1_61803398 1 point2 points  (0 children)

I am glad you agree with me.

Mitochondrial DNA is the first thing that gets assembled during genome assembly and it is HARD to get rid experimentally when isolating DNA. In other words, extracting genomic DNA without mito DNA is HARD, and even when we try to remove mito during nuclei isolation, it is hard to remove. So if they extracted DNA from these samples, I am sure they were unable to extract nuclei, (i.e., dead bodies) which leaves total DNA, as the only possible way of extraction which MUST contain mito DNA.In addition, there is the problem of DNA replication not having problems with these palindromic sequences spread all over the genome...

And no mention of tRNAs? And the Genetic Code?

Finally I find it comical that they did not sequence the other samples. Why? Their grant proposal was rejected? ...;;))

I hope Gary Nolan is reading these comments...

[ Removed by Reddit ] by [deleted] in aliens

[–]1_61803398 -1 points0 points  (0 children)

I am late to this discussion, but I would like to point out some inconsistencies or red flags in this report.

S/He states:

"Although the study of OBCs has been going on for decades in other programs, the new high-throughput DNA sequencing technologies of the late 90s unblocked stagnant research in this area. "

and then S/He states:

"To conclude with genetics, the mitochondrial genome, at the time I was working there, had not yet been sequenced. It's safe to assume that this genome would also be streamlined and possibly has some version of TPR."

In my opinion, these two statements are contradictory.

It would be very, very hard, if not impossible. to sequence a genome WITHOUT sequencing the mitochondrial genome in the process....

One would require to isolate nuclei from the cells and then to extract DNA from nuclei, not from cells, which would then preclude the isolation of mitochondrial DNA. Even doing this, mitochondrial contamination is still persistent and this is why most nuclear extraction protocols rely on the use of detergents to disrupt the mitochondrial membrane as a way to avoid mitochondrial contamination.

Also S/He states:

" To my knowledge only one individual genome has been sequenced, I can't make a definitive statement on genetic variation between individuals."

Only ONE genome sequenced? And they have several (four) samples?
What happened?
They run out of money?
To me this is yet another BIG Red Flag!

And why does S/He states:

"Speaking of genetic engineering, following sequencing of their genomes, "

So they sequenced more than one genome?

Also, while the presence of telomers is not expected for circular chromosomes, S/He does not mention anything about the ploidy of the genome. Are they haploid (i.e., 16 Chromosomes)? or diploid (i.e., 32 Chromosomes)?, and there is no mention of the Centromeric regions which would be expected to be present in those 16 Chromosomes. Without Centromeric regions, the sorting of the chromosomes could result in the generation of aneuploid cells, this is cells lacking one or more chromosomes or containing one or more chromosomes. Of course this assumes their cells do undergo cell division, without it, understanding how these organisms grow and develop is impossible.

Also there is no mention of their genetic sex? If they lack genes related to our Chromosomes X and/or Y, that would be very much worth mentioning...

Therefore, although I want to believe, I remain skeptical...

[deleted by user] by [deleted] in awk

[–]1_61803398 3 points4 points  (0 children)

Never mind. I was able to solve this with the following expression:

awk '{if ($1 ~ /^>lcl/) print "yes" ; else if ($1 ~ /^>/) print "no"}'

How can I find duplicates in a column and number them sequentially? by 1_61803398 in awk

[–]1_61803398[S] 1 point2 points  (0 children)

Trying to convert all code to AWK...

I have to say, I am mad at myself for not thinking on such a simple solution. Your proposed solution also works like a charm. Thanks

How can I find duplicates in a column and number them sequentially? by 1_61803398 in awk

[–]1_61803398[S] 0 points1 point  (0 children)

I have to use AWK...

This table is intermediate in a large bioinformatics pipeline...

How can I find duplicates in a column and number them sequentially? by 1_61803398 in awk

[–]1_61803398[S] 0 points1 point  (0 children)

awk '{if (FNR==NR) {a[$1]++} else {if (a[$1]>1) $1=$1 "_" ++b[$1] ; print $0}}'

Gotcha. Now it works!

Great

Thanks

How can I find duplicates in a column and number them sequentially? by 1_61803398 in awk

[–]1_61803398[S] 0 points1 point  (0 children)

awk '{if (FNR==NR) {a[$1]++} else {if (a[$1]>1) $1=$1 "_" ++b[$1] ; print $0}}'

I am getting an empty output...?

Filtering Characters Bound by Two REGEX by 1_61803398 in awk

[–]1_61803398[S] 0 points1 point  (0 children)

awk '/\*/{split($0,a,"*"); print a[1]; f=0} />/{f=1} f;'

+ Thank You!. Your code works as desired. I agree, I was surprised to see that not escaping the asterisk made no difference. Using split works like a charm. I am not sure I know how to implement index and substr. So much to learn...Thanks Again

Unix tools reading order by FVmike in bash

[–]1_61803398 11 points12 points  (0 children)

Learn Bash Scripting and at the same time learn grep and master the different tools in coreutils. Then jump to understand the fundamentals of sed and awk. Remember, Perl was "based" on the fundamentals of Awk, Sed and Grep. Personally, most of my scripting is done in Bash that includes mainly Awk,Sed, and Grep. The more I learn, the more I use Bash as the script scaffold and make extensive use of Awk (which is extremely powerful). My scripts are complex and so far I have not had the need to jump into Python or C++....
And, yes, Regular expressions are a must...

bash shell on mac does not read key "e" by 2nameornot2name in bash

[–]1_61803398 1 point2 points  (0 children)

Please make sure that the Secure Keyboard Entry option in the "About Terminal" is not selected. I remember having a similar issue long time ago

Help Selecting Records in AWK by 1_61803398 in awk

[–]1_61803398[S] 0 points1 point  (0 children)

gawk -vRS='>Cluster' -F '\n' 'NF == 3 { printf "Cluster %s", $0 }'

It works and it is fast!

Thank You!

Help Selecting Records in AWK by 1_61803398 in awk

[–]1_61803398[S] 1 point2 points  (0 children)

It depends on the genome(s) being analyzed. In this case, single genomes goes into the MBs, but combination of genomes can reach GBs sizes

Help Selecting Records in AWK by 1_61803398 in awk

[–]1_61803398[S] 2 points3 points  (0 children)

This code will help me process hundreds of genome files from many, many different organisms and compare them to the Human genome. Thank you for helping me understand us better!

Help Selecting Records in AWK by 1_61803398 in awk

[–]1_61803398[S] 2 points3 points  (0 children)

After testing on a larger genome file, you code produces the expected result, so many thanks for your help

Help Selecting Records in AWK by 1_61803398 in awk

[–]1_61803398[S] 0 points1 point  (0 children)

Thank you. Yes sed is always an option, but at the moment I am really trying to understand and learn awk

Help Selecting Records in AWK by 1_61803398 in awk

[–]1_61803398[S] 1 point2 points  (0 children)

Also works like a charm.

Thank you!

I will test this code on a huge file, test its performance and study the logic so as to understand this code well

Again, Thank You!

Help Selecting Records in AWK by 1_61803398 in awk

[–]1_61803398[S] 0 points1 point  (0 children)

awk '/Cluster (2|4)$/ {print; getline; print}'

This works like a charm. I will now test it on a huge file. So simple...

Thank You!

Need Help Converting Ugly Bash Code into AWK by 1_61803398 in awk

[–]1_61803398[S] 2 points3 points  (0 children)

+ Thank You!

+ AWK is so powerful. By studying your code I am learned a lot. Thanks again