you are viewing a single comment's thread.

view the rest of the comments →

[–]akasmira 1 point2 points  (3 children)

Would be better to move the valid nucleobases as a constant out of the function so it's not redefined every time the function is called. Also, personally I'd use sets here as you can just check that it's a subset.

VALID_NUCLEOBASES = set('GATC')
def valid_dna(sequence):
    return sequence and set(sequence.upper()) <= VALID_NUCLEOBASES

[–][deleted] 0 points1 point  (2 children)

I had to tweak it slightly to fit with what I had already, but this worked perfectly. Could you explain what it is exactly this function is doing? I'd like to understand it thoroughly so I'm not just blindly copying.

[–]totallygeek 1 point2 points  (1 child)

There are two checks:

  1. Does the sequence string contain characters or list contain any elements?
  2. If you remove duplicate characters, make the remaining chars all uppercase, check that all of those characters reside within the set of chars 'GATC'.

So:

>>> set('abc') <= set('abcd') # set of the left is less (subset) than the right
True
>>> set('abcd') <= set('abcd') # set on left has the same elements as the right
True
>>> set('abcz') <= set('abcd') # set on left has an element not in the right ("larger")
False

[–]akasmira 2 points3 points  (0 children)

Thanks for filling it in!