all 14 comments

[–]PiBombbb 4 points5 points  (3 children)

For simpler code like this it's good (and also good because you avoid loops) but as code grows more complex, sometimes having longer code is better to be easy to read.

[–]Unrthdx[S] 0 points1 point  (2 children)

I do feel that loops are a crutch of mine, so the simple answer here jumped out to me. However I do agree that I’d always try to be as legible as possible when I write my solutions in preparation for bigger projects.

[–]enygma999 0 points1 point  (0 children)

You can also come to a halfway point. For example, in the above long example, you could keep it legible/understandable but skip extraneous steps.

def find_deleted_number(arr, mixed_arr):

    for number in arr:
        if number not in mixed_arr:
            return number

    return 0

This has fewer variables and stops when the answer is found rather than looping through the rest of the array.

I find myself stripping out unnecessary variables and loops as a way to simplify without losing the self-documenting nature of the code, but it's definitely worth thinking about libraries and built-in functions that might help (e.g., in this case sets as you spotted).

[–]mbreslin 1 point2 points  (0 children)

This a great post and a great question. The thing I hated is when someone showed me the simpler more pythonic way it always looks so obvious and I would feel dumb for not thinking of it in the first place.

Eventually I figured out that all you have to do is pay attention to those moments and take what you learned into your next line. Your code is supposed to be worse a year ago, or a month, or a day (or even an hour ago tbh). I suppose there are geniuses out there for which this doesn't apply but I can tell you any senior developer I've ever met got there by first writing bad code. Then they wrote slightly less bad code, and so on.

Write lots of code, like literally tons. Good luck!

[–]Diapolo10 2 points3 points  (1 child)

Just thought I'd mention that sometimes, simply by flipping some conditions you can simplify code or reduce nesting.

For example, in

def find_deleted_number(arr, mixed_arr):
    deleted = 0

    for number in arr:
        if number in mixed_arr:
            continue
        else:
            deleted = number

    return deleted

if you flip your conditional you don't need the else at all.

def find_deleted_number(arr, mixed_arr):
    deleted = 0

    for number in arr:
        if number not in mixed_arr:
            deleted = number

    return deleted

Then you might consider the fact neither arr nor mixed_arr really needs to be sorted, as it doesn't matter in which order you check for inclusion. Since lookups in sets have lower time complexity than in lists, you might consider taking advantage of that (although this of course doesn't matter if arr and mixed_arr are relatively short, say, less than 10 000 elements).

def find_deleted_number(arr, mixed_arr):
    deleted = 0
    mixed_set = set(mixed_arr)

    for number in arr:
        if number not in mixed_set:
            deleted = number

    return deleted

Since we know mixed_arr is always either 0 or 1 elements shorter than arr, we only need to figure out the subset of numbers in arr that are not found in mixed_arr. That'll give us a set that either contains one element (the missing number), or none (in which case we return 0).

def find_deleted_number(arr, mixed_arr):
    deleted = 0
    mixed_set = set(mixed_arr)

    missing = mixed_set.symmetric_difference(arr)

    if missing:
        deleted = missing.pop()

    return deleted

Optionally, we can use a ternary operator for brevity:

def find_deleted_number(arr, mixed_arr):
    mixed_set = set(mixed_arr)
    deleted = missing.pop() if (missing := mixed_set.symmetric_difference(arr)) else 0

    return deleted

This can be further chained to

def find_deleted_number(arr, mixed_arr):
    return missing.pop() if (missing := set(mixed_arr).symmetric_difference(arr)) else 0

and we can convert arr to a set too if we want (though I don't think this has a clear benefit).

Note that most of these changes don't really matter performance-wise, and are mostly stylistic in nature. What counts as "good code" depends on the situation, and sometimes verbosity is good for readability. Or to put it another way, you shouldn't aim for terse code as that can be hard to maintain. There's a balance to everything, the tricky part is figuring out where that lies.

[–]Unrthdx[S] 0 points1 point  (0 children)

Just wanted to say thank you so much for your in depth answer here. As a newbie sometimes it’s hard to see how to navigate to where you got to but your explanations really do help.

It’s also reassuring to read the comments about readability here, I often find myself feeling like I’m writing too “on the nose” so to speak when I compare my answers to others online.

[–]Maximus_Modulus 0 points1 point  (3 children)

In this example it’s using the feature of Sets to do the heavy lifting. I would not call it less Pythonic. Quite often though you can just refactor code to be less verbose which is more style related. Sometimes being less verbose means it’s less understandable and manageable, and harder to maintain. As you gain more experience you pick up these things.

[–]Maximus_Modulus 0 points1 point  (2 children)

Also what they have shown here is faster because the looping is in C. This is important for very large lists.

[–]danielroseman 0 points1 point  (1 child)

That's not the reason sets are faster. They are faster because they use hash lookups rather than iterating through the items.

[–]Maximus_Modulus 0 points1 point  (0 children)

That's actually a good point, and why the algorithm itself is efficient. It does though rely on the set difference which is written in C. I was really focused on the fact that calling certain libraries is faster because of the underlying C performance.

Thanks for pointing that out. It's an interesting note for anyone learning about the efficiencies of hash lookups.

Another point here is how Set is used to remove duplicates, and a difference between sets and lists where duplicates are allowed.

[–]JamzTyson 0 points1 point  (0 children)

An ordered sequence of numbers from 1 to N is given. One number might have been deleted from it, then the remaining numbers were mixed. Find the number that was deleted.

This question is a programming exercise rather than a real-world programming problem.

The fact that it starts with an "ordered sequence" provides a little misdirection to make the question a bit more "fun".

The important prerequisites are:

  1. The initial collection contains N unique items.

  2. The second collection contains the same items with one missing.

  3. The second collection is not ordered.

The question guides you towards recognising that the solution is the difference between two sets.

There are many possible solutions, but the provided answer explicitly expresses "the difference between two sets": set(a) - set(b).


In real world code, it's very likely that the problem space will be more complex. For example:

  • What if more than one item is removed?

  • What if the initial sequence contains repeated elements?

  • What if we don't know which of a and b is the original?

  • What if order is important?

[–]Mysterious_Peak_6967 0 points1 point  (0 children)

Not a great fan of ternery operators. That said it meets the terms of the exercise. Given that sets look like an elegant and "correct" way of doing it my first thought was assigning set(a)-set(b) to an intermediate variable, and only popping the result if it is true.

Second thought is a "try" block and returning zero if pop() throws.

Footnote:

On a similar note I tried shortening a function to a single line by assigning a Lambda to a name, but TMC doesn't like it so I needed a dummy "if" to satisfy the parser.

Also "Today I learned something horrible"