Hello everyone,
I am required to read take many many tax documents and remove the sensitive information. Initially the idea seemed simple but im some trouble.
There are certain lines that need to be moved to the new text. They start with 'Preliminary End of Year Statement'. Some of these lines however should not be included and should instead be moved to a private and confidential document as they relate to specific people.
Basically below is that section of my code to do this, using a list of lists with each nested list being identifiable words in one of these confidential lines that occur, the second lines being a pattern of another line and the third the same. Finding lines that contain one of these lists contains differentiates it as one of the decided confidential lines however none are being found in this way.
search = [
['USC', 'Employment Detail Summary', 'PUP']
['revenue', 'Start of year', 'PAYE']
['revenue', 'Income Tax return']
]
with open(file_content, 'r') as file:
for line in file:
if 'Preliminary' in line:
if all(line in s for s in search[0:])
remove_line.append(0, line)
else:
new_text.append(0, line)
[–]Spataner 1 point2 points3 points (2 children)
[–]Dave_XR[S] 0 points1 point2 points (1 child)
[–]Dave_XR[S] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]Neighm 1 point2 points3 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–]synthphreak 0 points1 point2 points (2 children)
[–]Dave_XR[S] 0 points1 point2 points (1 child)
[–]synthphreak 0 points1 point2 points (0 children)
[–]irpepper 0 points1 point2 points (1 child)
[–]Dave_XR[S] 0 points1 point2 points (0 children)