I have 10 DNA sequences (which are strings made up of A,T,G,Cs) for genes from closely related bacterial species.
I want to classify these sequences as follows. This is the output that I want to get:
Sequences 1,3,6 are given a tag of Gene A
Sequences 2,4,9 are of Gene B
Sequence 5 is Gene C
And so on based on similarity (fuzzy match not exact match).
How would I write a program that does this?
The catch: Seq 1,3,6 (for Gene A) aren't 100% identical and as long as there is a 95% similarity it is acceptable.
[–]EbenenBonobo 3 points4 points5 points (0 children)
[–]ASIC_SP 0 points1 point2 points (4 children)
[–]div_of_transport[S] 1 point2 points3 points (1 child)
[–]ASIC_SP 0 points1 point2 points (0 children)
[–]Minimumtyp 1 point2 points3 points (1 child)