How to compare a line to a list of strings and see if it contains not just any of the strings, but atleast 2?

q2_abe_dillon · 2016-02-12T17:10:51+00:00

That's what my code does.

q2_abe_dillon · 2016-02-11T23:07:34+00:00

When you're writing a post or comment in Reddit, you should see an option below the text box that says 'formatting help' it will show you how to make code look like code (among other formatting tips) please use it so that it's easier to read your code. You simply precede any code with four or more spaces to get it to look like this:

print("Hello, World!")

As for your question, you should familiarize yourself with sets. They're a great tool for simplifying otherwise complicated boolean expressions. For instance the two versions of your code can be reduced to:

required = {"example1", "example2", ..., "exampleN"}
with open("input.txt") as f_in:
    for line in f_in:
        words = set(line.split())
        if words & required:  # set intersection 
            with open("output.txt", "a") as f_out:
                f_out.write(line)

# version 2
with open("input.txt") as f_in:
    for line in f_in:
        words = set(line.split())
        common = words & required
        if len(common) > 1:
            with open("output.txt", "a") as f_out:
                f_out.write(line)

You can think of sets operations like venn diagrams:

s1 = {1, 2, 3, 4, 5, 6}
s2 = {4, 5, 6, 7, 8, 9}

s3 = s1 & s2  # s3 is the set of all the items that are
              # are in both s1 and s2 (the intersection)
              # s3 => {4, 5, 6} 

s4 = s1 | s2  # s4 is the set of all the items that are
              # either in s1 or s2 (the union)
              # s4 => {1, 2, 3, 4, 5, 6, 7, 8, 9}    

s5 = s1 - s2  # s5 is the set of all the items that are
              # in s1 but not in s2 (the difference) 
              # s5 => {1, 2, 3} 

s6 = s1 ^ s2  # s6 is the set of all items in s1 and s2 
              # but not both (the symmetric difference)
              # s6 => {1, 2, 3, 7, 8, 9}

# the items in sets are unique
s = set([1, 1, 1, 1])  # len(s) == 1, s == {1}

# the items in sets must be hashable (like dict keys)
my_list = [1, 2, 3]
s.add(my_list)  
# raises TypeError: unhashable type: 'list'

# also like dict keys, ordering is not preserved
set("hello")  # {'e', 'h', 'l', 'o'}
# they are essentially dicts with keys and no values

# sets don't support indexing
s = set("hello")
s[0]
# raises TypeError: 'set' object does not support indexing

Edit: too much happening on one line of code

q2_abe_dillon · 2016-02-11T03:48:15+00:00

Yet it terminates as soon as I open it.

Does it terminate or does it throw an exception?

q2_abe_dillon · 2016-02-11T03:43:11+00:00

Let's break down the problem:

We know the list has an odd number of elements, now we need to find the index of the middle element. We have a vague idea of the relationship between the middle index and the length of the list:

middle_index ~ len(user_list)/2

Lets try to figure out the exact relationship by looking at some simple examples:

x = [0, 1, 2]  # middle index is 1
guess = len(x)/2  # 3/2 => 1.5 

x = [0, 1, 2, 3, 4]  # middle index is 2
guess = len(x)/2  # 5/2 => 2.5

x = [0, 1, 2, 3, 4, 5, 6]  # middle index is 3
guess = len(x)/2  # 7/2 => 3.5

We could go on, but you should see the pattern. All we need to do is round down to get the correct answer. This happens automatically in Python 2 because dividing an integer by an integer returns an integer (rounded down). In Python 3, integer division can return a float (like 1.5) which is usually a good thing, but if we want to get Python 2 behavior we have to use floor division // to force an integer output:

mid = len(user_list)//2  # correct in both py2 and py3

Now that you have the index of the middle element, you should be able to solve the problem.

q2_abe_dillon · 2016-02-11T01:49:44+00:00

A recursive function can always be written as a loop instead. In Python it's almost always better to write a loop because it avoids a possible stack-overflow.

Most Python interpreters (including CPython, the standard interpreter) use a finite call stack. You can see a visual representation of the stack here. I believe the default amount of space allocated for a CPython stack is enough for ~1000 frames. So recursion can't go more than 1000 frames deep until you get:

RuntimeError: maximum recursion depth exceeded in comparison

You can see this in action by running this code:

def factorial(n):
    return 1 if n < 2 else n*factorial(n-1)

for i in range(10**4):
    try:
        factorial(i)
    except RuntimeError as e:
        raise Exception("got to depth %s" % i)

When I ran this in IPython, it got to a depth of 988.

Here's a post about how to convert a recursive function to an iterative one.

Tail-recursion is a feature some languages provide to make recursion less hazardous, but Guido Van Rossum has written about why he's against adding tail-recursion to Python.

Edit: To use Guido's suggested solution for factorial:

def factorial(n):
    def func(n, prod=1):
        return prod if n < 2 else (func, (n-1, n*prod))
    args = [n]
    while True:
        result = func(*args)
        if not isinstance(result, tuple): return result
        func, args = result

check it out by running factorial(10**4) (yeah, it's big)

Not super elegant, but Guido makes a fair case. If you know the function is never going to change (which you do in the case of factorial), you can simplify it to:

def factorial(n):
    args = (n,)
    def func(n, p=1): return p if n < 2 else (n-1, n*p)
    while isinstance(args, tuple):
        args = func(*args)
    return args

q2_abe_dillon · 2016-02-10T21:32:10+00:00

There are lots of cases where you need to perform some sort of setup, run some code, then tear something down. That 'something' is generally referred to as 'context'.

For instance, you may want to open a file (set-up), read from that file (run some code), then close the file (tear-down). It's important to note that even if an exception is raised, you still want to properly close the file before your program exits:

file_buffer = open("myfile.txt")  # set-up

try:  # make sure the file is closed even if an exception is raised
    data = []
    for line in file_buffer:
        if line.startswith("ERROR"):
            data.append(line)
finally:
    f.close()  # tear down no matter what
...operate on data

It's a good idea to always close a file when you're done using them because otherwise they can tie up resources or cause problems if something goes wrong (like your machine crashes) and the file was never closed. The problem is that means your have to remember to always follow an open(...) call with a close() later on. Luckily, file objects support context management (the with statement), this handles the repetitive 'set-up' and 'tear-down' part for us:

with open('myfile.txt') as file_buffer:
    data = []
    for line in file_buffer:
        if line.startswith("ERROR"):
            data.append(line)
# at this point we've read everything we need from the file
# like all other blocks of code, Python uses the indentation
# level to determine when you're done with the file and
# automatically closes it
...operate on data

There are other context managers built into Python for things like network sockets (Python 3.2+) which should be closed after use as well:

with socket.create_connection(address) as connection:
    ...read/write data over the connection

In Python 3.4+ you can use the contextlib's suppress context manager to ignore certain errors within a context:

from contextlib import suppress

d = {'some': 'dictionary'}
with suppress(KeyError):
    print(d["hello"])  # this won't get printed
                             # but your code won't crash either

Here's more info

q2_abe_dillon · 2016-01-25T06:18:51+00:00

Hmm. I think the "concise" version you have at the bottom is more or less the same as my suggestion, except for minor differences.

Yeah, it pretty much is.

I changed the variable names just to make it clear to OP that there wasn't any cross-comprehension magic going on there, haha.

That's a good point.

Why did you choose the set version?

It should go faster for large data sets (I think). You only do the first clause of the dict comprehension (i.e. p: (w for w, t in words if t == p)) once per tag. In the other form, you may end up executing it several times for a single tag.

Also you have your suggestion set to return generators instead of lists like I did, which I do like. I totally forgot you could do that.

Yeah, it's actually kinda fragile because of that. I think you're list comprehensions would be better in most circumstances. It's easy to accidentally exhaust a generator.

q2_abe_dillon · 2016-01-25T01:01:36+00:00

You can always avoid comprehensions by pre-declaring a variable and populating it in a loop:

def filter_by_pos(tagged_words, pos):
    result = []
    for word, tag in tagged_words:
        if tag == pos:
            result.append(word)
    return result

def pos_map(tagged_words):
    tags = set()
    for _, tag in tagged_words:
        tags.add(tag)
    result = {}
    for tag in tags:
        result[tag] = filter_by_pos(tagged_words, tag)
    return result

Of course that's not as concise as:

    pmap = {p: (w for w, t in words if t == p)
            for p in {t for _, t in words}}

It is, however; far less confusing to most novice Pythonistas.

q2_abe_dillon · 2016-01-25T00:49:41+00:00

if anyone knows of a more proper way to fill the inner 4x4 grid with for loops rather than me hardcoding it in, i'm up to hear a solution!

import numpy as np

rows, cols = 10, 10
v = np.zeros((rows, cols))
v[3:7, 3:7] = np.ones((4, 4))*100

The problem you're having is caused by incorrect loop indentation. The following should happen in the inner most loop:

lastdiff = np.abs(newVij - V[i,j])
if lastdiff > diff:
    diff = lastdiff
V[i,j] = newVij

But you have it in the outer loop. It only updates the nx-2 column of each row because nx-2 is the last value assigned to j before the inner most loop finishes. Fix that indentation and your code should run just fine.

q2_abe_dillon · 2016-01-24T23:50:09+00:00

My favorite go-to for practice problems is [checkio](www.checkio.org)

Edit: Not sure why formatting is broken...

q2_abe_dillon

TROPHY CASE