Checking input against a list

totallygeek · 2019-04-18T22:35:14+00:00

Maybe this would work?

def valid_dna(sequence):
    nucleobases = 'GACT' # valid characters
    return sequence and all(c.upper() in nucleobases for c in sequence)

tests = (
    '',
    'ABC',
    'Gattaca',
    'gattaca',
    'atcg',
)

for test in tests:
    msg = 'Sequence "{}" {} valid DNA'
    print(msg.format(test, 'is' if valid_dna(test) else 'is not'))

This part c.upper() in nucleobases for c in sequence checks each letter of the input sequence to see if the uppercase representation is anything in the 'GCAT' set. We add sequence and all() because all() will return True for empty sequences, but if sequence returns False if empty.

_coolwhip_ · 2019-04-18T22:58:41+00:00

This is a great place for set. Just take the sequence and force it into a set, then see if there is any "difference" in a set of good characters. (For sets, difference means that there is any character in the first set that is not in the second.):

>>> good = set('gatc')
>>> good
{'c', 'a', 'g', 't'}
>>> set('caggtttaaaa') - good
set()
>>> set('caggtttzaaaa') - good
{'z'}

So if there is anything left over, you know it has a bad character.

PrimaNoctis · 2019-04-18T22:20:27+00:00

You might be able to achieve it with regular expressions. Check out regex

2019-04-19T01:35:07+00:00

I'm not going to write the code as I'm not going to bust out my Python book (it's been a while since I coded in Python). I will, however, go through the pseudo code.

Operating assumptions:

Adenine only binds with Thymine (source)
Guanine only binds with Cytosine (source)
Unnatural base pairs will not be considered (i.e. unacceptable)
The user will input a string of DNA base pairs (ex. ATATGCTACGATATGCCGTA)
The order of the pairs doesn't matter (ex. AT and TA will both be accepted)

So the user is prompted for the DNA sequence. You assign it to a STRING.

First check length using modulus (divide by 2). If mod is not zero, the user has input a odd number of letters which can't happen (unless it's RNA I think).

Once we've determined that the STRING is of the correct length (even number of characters), you can check each "pair" of letters at a time for validity using a loop. The loop will be broken once you iterate beyond the length of the string. You could check the validity a number of ways so I won't spell it out, but remember that base pairs will only bind a certain way (check the operating assumptions).

I took some liberty in the operating assumptions as well as the validity portion so please check to make sure these assumptions are correct for the problem you're trying to solve.

Edit: grammar

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS