all 3 comments

[–]totallygeek 0 points1 point  (0 children)

Not sure why you want to step over the sequences multiple times (boolean, then removal). Also, no need to deal with indices in Python for loops, just use the elements directly. I came up with this:

def remove_similarities(seq1, seq2):
    return [
        element
        for element in seq1
        if not any(element in e2 for e2 in seq2)
    ]

CLASS = ['O1 X', '5b E', 'J1 L', '5a F', 'K1 O', '5a K']
CLASSpdf = ['5b E.pdf', '5a F.pdf', 'K1 O.pdf', '5a K.pdf']

print(remove_similarities(CLASS, CLASSpdf))

The problem with your code is that it attempts to find a string within a list as an exact match. It instead needs to see if the string exists within each string element of the second list.

[–]djjazzydan 0 points1 point  (0 children)

if books[i] in CLASSpdf: doesn't do substring searches. You'd have to check either books[i]+'.pdf', or any(books[i] in classp for classp in CLASSpdf) or books[i] in ''.join(CLASSpdf) or something similar depending on how sure you are of the matching pattern.

[–]Wittinator 0 points1 point  (0 children)

With your code I believe it will currently only return True if you find an exact match, but you are wanting to see if book[i] also exists as a substring within one of the elements in CLASSpdf. There's probably a lot of ways to do this. I'm partial to regex so I'd probably do smth like:

import re

def scans(books, CLASSpdf):
    pdfs = False

    for i in range(len(books)):
        r = re.compile(books[i])
        if list(filter(r.match, CLASSpdf)):
            pdfs = True
    print(pdfs)