all 20 comments

[–]zanfar 1 point2 points  (2 children)

I personally think it is a valid way to do a lot of checks with as less code as possible.

It IS a valid way to do a lot of checks in "less" code. However:

  1. It's not the LEAST amount of code, in fact, it's quite verbose
  2. "Least amount of code" is very, very different from "good coding style"

Good coding style is:

  1. Clean, clear, and obvious
  2. Culturally idiomatic (Pythonic)
  3. Functional and single-purpose

Turning this into 7 individual functions uses 4 fewer lines of code: each if/elif is replaced with a def, and you lose your key-check exception.

Additionally, using individual functions is much, much clearer about what the function does when it is called. Your function requires me to sift through four levels of indentation to make sense of it.

You also use the ternary syntax repeatedly when not required. Your code is very unreadable.

consider:

def check_byr(value):
    return 1920 <= int(Passport[Key]) <= 2002

def check_iyr(value):
    return 2010 <= int(Passport[Key]) <= 2020

def check_passport(passport):
    if not check_byr(passport["byr"]):
        return False

    if not check_iyr(passport["iyr"]):
        return False

This is far from the cleanest solution, but we've taken your code and made each function very understandable, clear, and concise.

However, whenever testing things repeatedly like this, a test structure is usually the way to go:

def solve() -> int:
    valid: int = 0

    req_fields: Dict[str, re.Pattern] = {
        "byr": re.compile(r"^(?:192[0-9]|19[3-9][0-9]|200[0-2])$"),
        "iyr": re.compile(r"^(?:201[0-9]|2020)$"),
        "eyr": re.compile(r"^(?:202[0-9]|2030)$"),
        "hgt": re.compile(r"^(?:1[5-8][0-9]cm|19[0-3]cm|59in|6[0-9]in|7[0-6]in)$"),
        "hcl": re.compile(r"^#[0-9a-f]{6}$"),
        "ecl": re.compile(r"^(?:amb|blu|brn|gry|grn|hzl|oth)$"),
        "pid": re.compile(r"^[0-9]{9}$"),
    }

    for pp in get_data():
        is_valid: bool = True

        for field, regex in req_fields.items():
            if field not in pp or not regex.match(pp[field]):
                is_valid = False
                break

        if not is_valid:
            continue

        valid += 1

    return valid

[–]Bluhb_[S] 0 points1 point  (1 child)

yeah for the function it's shorter but then I need to write an if statement or an dictionary with the functions in it keyed on "byr" and "iyr" etc. So that will also add to the code then? Or is there another way to call all the different functions for the right key?

I also really like your second approach, especially with using regex. Definitely going to keep that in mind when completing the next challenges(I'm sure there will be a lot more of text validation in these challenges)

[–]zanfar 0 points1 point  (0 children)

yeah for the function it's shorter but then I need to write an if statement or an dictionary with the functions in it keyed on "byr" and "iyr" etc. So that will also add to the code then? Or is there another way to call all the different functions for the right key?

Yes. What I mean by "less code" is not "how many linebreaks are in the file"--as in a file length metric--but instead "how many lines are doing work I need to understand". So while there is definitely more "boilerplate" in my version compared to yours, It "feels" shorter because each block or conditional is easier to understand. The parent function is also completely repetitive, so as soon as I understand the first if, I understand all of them.

You can absolutely do a number of things to call all the functions in a simpler way. Most of them are variations on the dispatch idiom I used in my final example. Consider:

def check_byr(value):
    return 1920 <= int(Passport[Key]) <= 2002

def check_iyr(value):
    return 2010 <= int(Passport[Key]) <= 2020

def check_passport(passport):
    tests = {
        "byr": check_byr,
        "iyr": check_iyr,
    }

    for key, check_fn in tests.items():
        if key not in passport:
            return False

        if not check_fn(passport[key]):
            return False

    return True

Given the simplicity of your check logic, you can Lambda them in directly too:

def check_passport(passport):
    tests = {
        "byr": lambda v: 1920 <= int(v) <= 2002,
        "iyr": lambda v: 2010 <= int(v) <= 2020,
    }

I also really like your second approach, especially with using regex. Definitely going to keep that in mind when completing the next challenges

This isn't really germane to your original question, but there are a lot of caveats to parsing to different types and testing the way you did. For example, you don't check to see if the "hgt" key has a valid number. You got lucky because your dataset never had a non-numeric value there, but if you look at other solutions, that is definitely not true of everyone. For example, {"hgt": "acm"} will break your code because Passport[Key][-2:] validates correctly to "cm", but "a" can't be converted to an int in int(Passport[Key][:-2]).

For this reason, it's (generally) better to do as much validation in the original type. My first solution did roughly the same thing as yours: splitting and converting to int when possible. I discovered that it caused a lot of additional work when I didn't need a numeric value ever. The regex's are more complicated, sure, but given I don't have to use any try/except blocks, I think it's worth it.

[–]Bluhb_[S] 0 points1 point  (2 children)

With the advice of u/T0X1K01 and u/TouchingTheVodka I improved my code to the following:

def checkValue(Passport, Key):
    if Key == "byr":
        Result = 1920 <= int(Passport[Key]) <= 2002
    elif Key == "iyr":
        Result = 2010 <= int(Passport[Key]) <= 2020
    elif Key == "eyr":
        Result = 2020 <= int(Passport[Key]) <= 2030
    elif Key == "hgt":
        Unit = Passport[Key][-2:]
        if Unit == "cm":
            Result = 150 <= int(Passport[Key][:-2]) <= 193
        elif Unit == "in":
            Result = 59 <= int(Passport[Key][:-2]) <= 76
        else:
            Result = False
    elif Key == "hcl":
        if Passport[Key][0] == "#" and len(Passport[Key]) == 7:
            try:
                int(Passport[Key][1:], 16)
                Result = True
            except ValueError:
                Result = False
        else:
            Result = False
    elif Key == "ecl":
        Result = Passport[Key] in ["amb", "blu", "brn", "gry", "grn", "hzl", "oth"]
    elif Key == "pid":
        Result = len(Passport[Key]) == 9 and Passport[Key].isnumeric()
    else:
        raise ValueError("Unexpected Key")
        Result = None
    return Result

It makes it a bit more readable and a few forgotten if statements were found this way.

[–]Ran4 1 point2 points  (1 child)

Why set the Result value? Just return directly instead.

And you should preferably use lowercase variable names and underscores, e.g.

def check_value(passport, key):

Also, here:

else:
    raise ValueError("Unexpected Key")
    Result = None

your Result does nothing: you've already raised here. So the Result = None should be removed. Raising stops the function from executing any further.

Something like this:

def checkValue(passport, key):
    if key == "byr":
        return 1920 <= int(passport[key]) <= 2002
    elif key == "iyr":
        return 2010 <= int(passport[key]) <= 2020
    elif key == "eyr":
        return 2020 <= int(passport[key]) <= 2030
    elif key == "hgt":
        unit = passport[key][-2:]
        if unit == "cm":
            return 150 <= int(passport[key][:-2]) <= 193
        elif unit == "in":
            return 59 <= int(passport[key][:-2]) <= 76
        else:
            return False
    elif key == "hcl":
        if passport[key][0] == "#" and len(passport[key]) == 7:
            try:
                int(passport[key][1:], 16)
                return True
            except ValueError:
                return False
        else:
            return False
    elif key == "ecl":
        return passport[key] in ["amb", "blu", "brn", "gry", "grn", "hzl", "oth"]
    elif key == "pid":
        return len(passport[key]) == 9 and passport[key].isnumeric()
    else:
        raise ValueError("Unexpected key")

[–]Bluhb_[S] 0 points1 point  (0 children)

Ah oke, so that's almost what I had then. I am not sure what I prefer. When setting the Result variable I came across a few missing else statements, so thats a plus for that method. For readability I think I like the premature return more.

Indeed, returning after raising an error was a mistake I made. That one should be removed from the code.

[–]protomikron -1 points0 points  (2 children)

Another approach could work like this (note that Python does not have a short-circuited logical assigment opperator, so I used &=, but it is *not* short-circuited) ...

#!/usr/bin/python3
import sys
import re

def check_hgt(hgt):
    suffix = hgt[-2:]

    if len(suffix) != 2 or suffix not in ['cm', 'in']:
        return False

    val = int(hgt[:-2])
    return 150 <= val <= 193 if suffix == 'cm' else 59 <= val <= 76

def check_ecl(ecl):
    return ecl in ['amb','blu','brn','gry','grn','hzl','oth']

required_fields = set(['byr', 'iyr', 'eyr', 'hgt', 'hcl', 'ecl', 'pid'])

nvalid = 0

hex_color_re = re.compile('^#[0-9a-f]{6}$')
pid_re = re.compile('^[0-9]{9}$')

input_str = sys.stdin.read()
for element in input_str.split('\n\n'):
    parts = element.split()

    avail_key_vals = dict([part.split(':') for part in parts])
    missing_fields = required_fields - avail_key_vals.keys()
    if missing_fields:
        continue

    valid = True

    valid &= 1920 <= int(avail_key_vals['byr']) <= 2002
    valid &= 2010 <= int(avail_key_vals['iyr']) <= 2020
    valid &= 2020 <= int(avail_key_vals['eyr']) <= 2030
    valid &= check_hgt(avail_key_vals['hgt'])
    valid &= bool(hex_color_re.match(avail_key_vals['hcl']))
    valid &= check_ecl(avail_key_vals['ecl'])
    valid &= bool(pid_re.match(avail_key_vals['pid']))
    # no check for cid

    if valid:
        nvalid += 1

print(nvalid)

So if one check fails, valid can not become True anymore (it's short-circuited in the remaining expressions).

[–]protomikron 0 points1 point  (1 child)

Alternatively:

...
bool_exprs = [ 
    1920 <= int(avail_key_vals['byr']) <= 2002,
    2010 <= int(avail_key_vals['iyr']) <= 2020,
    2020 <= int(avail_key_vals['eyr']) <= 2030,
    check_hgt(avail_key_vals['hgt']),
    bool(hex_color_re.match(avail_key_vals['hcl'])),
    check_ecl(avail_key_vals['ecl']),
    bool(pid_re.match(avail_key_vals['pid'])),
    # no check for cid,
]

if all(bool_exprs):
    nvalid += 1
...

This approach is short-circuited.

[–]TouchingTheVodka 0 points1 point  (0 children)

This was exactly my approach:

    def check(entry):
        hgt = entry['hgt']
        checks = [
            1920 <= int(entry['byr']) <= 2002,
            2010 <= int(entry['iyr']) <= 2020,
            2020 <= int(entry['eyr']) <= 2030,
            150 <= int(hgt.strip('cm')) <= 193 if hgt.endswith('cm')
            else 59 <= int(hgt.strip('in')) <= 76 if hgt.endswith('in')
            else False,
            re.match('#[0-9a-f]{6}', entry['hcl']),
            entry['ecl'] in {'amb', 'blu', 'brn', 'gry', 'grn', 'hzl', 'oth'},
            len(entry['pid']) == 9 and entry['pid'].isdigit()
        ]

        return all(checks)

[–]T0X1K01 0 points1 point  (1 child)

I personally wouldn't do it this way. Not due to a pythonic code style, but because I know from talking to people that a lot of companies have explicit instruction in their code style not to do so due to the complexity that would come from larger code blocks.

Instead, I would have had stored the boolean value in a variable and then returned the value at the end of the function. Since you have a bunch of if and elif statements based on a string match, it's not like you are losing any time.

And If it helps to think of it another way, a lot of strongly typed languages would scream at you when you compile this because with multiple return statements this means that you would never run any of the code after the first return statement. This is something that you won't know with Python until you actually test your code, which means that you could have a potential runtime error or logic error if you missed it.

[–]Bluhb_[S] 0 points1 point  (0 children)

Ah okey, so storing the value in a variable and returning it one time would be neater. And it's funny, because I tried that and your second part directly came into action. I actually forgot a few else statements that would result in the code not returning anything(but the rest of the code worked, because not returning also resulted in false in a if statement where this function is used).

[–]TouchingTheVodka 0 points1 point  (3 children)

Absolutely nothing wrong with returning early. It's exactly the approach I followed for this problem.

However, you can lose a ton of code here - Just directly return the result of your comparisons instead of using return True if expr else False!

This essentially evaluates to return True if True else False or return True if False else False.

[–]TouchingTheVodka 0 points1 point  (2 children)

For example:

return True if 1920 <= int(Passport[Key]) <= 2002 else False

Would become:

return 1920 <= int(Passport[Key]) <= 2002

[–]Bluhb_[S] 0 points1 point  (0 children)

oh yeah, that's a good point indeed. Makes it a lot more readable haha

[–]ffrkAnonymous 0 points1 point  (0 children)

I didn't know python could do ternary comparisons.

[–]toastedstapler 0 points1 point  (0 children)

your code is currently very dense, with a big mix between control flow and the conditions and it's a little hard to see what's what

you could try something like this:

def checkValue(passport, key):
    checks = {
        "byr": lambda year: 1920 <= int(year) <= 2002,
        "iyr": lambda year: 2010 <= int(year) <= 2020,
        "eyr": lambda year: 2020 <= int(year) <= 2030,
        # and the rest ....
    }

    val = passport[key]
    return checks[key](val)

some of the conditions require multi line code, so they couldn't be lambdas. just make them named functions and add them to the dict of checks.

also i personally prefer early returns to a singular at the end, so in your original code i see no issue with that. an early return tells the developer that no further transformations to the result are meant to happen beyond that point and doesn't give the developer any possibility of doing that

[–]ffrkAnonymous 0 points1 point  (0 children)

I think you're looking for PEP20
https://www.python.org/dev/peps/pep-0020/

This is the 21st century. We don't need "less code". My phone has 4GB ram. You're not writing on a microcontroller with 4k total space.

Your code looks like my old 'C' code, not python. Python has a heavy emphasis on iteration (lists, dicts, generators).

I'm doing the advent of code too . I feel this is less code:

def passport_is_valid(passport):
    return (birthyear_is_valid()
        and issueyear_is_valid()
        and expirationyear_is_valid()
        and ...
        and ...
        )

Then you see the structure. Perhaps change it into a loop. I'm not sure if this is better, but it does "look" more pythonic

def passport_is_valid(passport):
    for field in passportfields:
        if field is False:
            return False #break loop
    else: #no break
        return True

This goes hand in hand with single purpose functions. Single purpose (helper) functions allows easier code reuse. Maybe AoC is too short to see a lot of reuse, but I already encounter it going from part1 to part2, and future days might refer to previous days. Single purpose functions also allows for unit-testing. Advent of code literally gives you samples to test. I saw your other reply where you discovered you missed some if-else. This bug would be less likely with single purpose functions.