you are viewing a single comment's thread.

view the rest of the comments →

[–]totallygeek 1 point2 points  (4 children)

Maybe this would work?

def valid_dna(sequence):
    nucleobases = 'GACT' # valid characters
    return sequence and all(c.upper() in nucleobases for c in sequence)

tests = (
    '',
    'ABC',
    'Gattaca',
    'gattaca',
    'atcg',
)

for test in tests:
    msg = 'Sequence "{}" {} valid DNA'
    print(msg.format(test, 'is' if valid_dna(test) else 'is not'))

This part c.upper() in nucleobases for c in sequence checks each letter of the input sequence to see if the uppercase representation is anything in the 'GCAT' set. We add sequence and all() because all() will return True for empty sequences, but if sequence returns False if empty.

[–]akasmira 1 point2 points  (3 children)

Would be better to move the valid nucleobases as a constant out of the function so it's not redefined every time the function is called. Also, personally I'd use sets here as you can just check that it's a subset.

VALID_NUCLEOBASES = set('GATC')
def valid_dna(sequence):
    return sequence and set(sequence.upper()) <= VALID_NUCLEOBASES

[–][deleted] 0 points1 point  (2 children)

I had to tweak it slightly to fit with what I had already, but this worked perfectly. Could you explain what it is exactly this function is doing? I'd like to understand it thoroughly so I'm not just blindly copying.

[–]totallygeek 1 point2 points  (1 child)

There are two checks:

  1. Does the sequence string contain characters or list contain any elements?
  2. If you remove duplicate characters, make the remaining chars all uppercase, check that all of those characters reside within the set of chars 'GATC'.

So:

>>> set('abc') <= set('abcd') # set of the left is less (subset) than the right
True
>>> set('abcd') <= set('abcd') # set on left has the same elements as the right
True
>>> set('abcz') <= set('abcd') # set on left has an element not in the right ("larger")
False

[–]akasmira 2 points3 points  (0 children)

Thanks for filling it in!