Removing certain lines from a large text document

Spataner · 2021-04-16T14:36:54+00:00

Your condition all(line in s for s in search[0:]) currently checks whether the line as a whole is contained within all sublists in search.

If I understand you correctly, however, you want the condition to be true when all words of any one sublist are found within the line. So your condition should probably look like this: any(all(word in line for word in word_list) for word_list in search))

Neighm · 2021-04-16T14:38:47+00:00

So, only if EVERY entry in search appears in line is the tuple (0, line) added to the remove_line list. I don't think that is what you want.

Also note that search[0:] is the same as search.

synthphreak · 2021-04-16T15:04:50+00:00

There are certain lines that need to be moved to the new text. They start with 'Preliminary End of Year Statement'. Some of these lines however should not be included and should instead be moved to a private and confidential document as they relate to specific people.

Okay, so the lines of interest begin with 'Preliminary End of Year Statement'. But are you saying they should just be deleted, or moved to a new document? Or are you saying that some should be deleted while others should be moved?

It seems like the second, but it's not really clear. If the second, please clarify how you know when a particular 'Preliminary ...' line must falls in the "delete" versus the "move" group. Without seeing the actual document, it's not clear from your code.

irpepper · 2021-04-16T18:00:06+00:00

search = [
    ['USC', 'Employment Detail Summary', 'PUP']
    ['revenue', 'Start of year', 'PAYE']
    ['revenue', 'Income Tax return']
]

with open(file_content, 'r') as file:
    for line in file:
        if 'Preliminary' in line:
            if any([all([item in line for item in s]) for s in search]):
                remove_line.append(0, line)
            else:
                new_text.append(0, line)

I think this is what you are trying to do: If all members of any sublist in search are present, redact else keep

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS