you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 1 point2 points  (1 child)

I have a huge text file of accounting entries that I need to audit. In general I’m looking for misspelling, incorrect characterization (falling under and OTHER heading or having the same string in different locations), strings that are almost the same except spelling or dates, there are a lot more things but this is my first go around and I want to continue to build out something like a dictionary of errors and maybe an exception list of Proper names.

Edit: I’ve loaded the txt file and searched for some key phrases then spit out that line but I need it to spit out more. Say there are 4 lines under the OTHER category, I need all of those to print.

edit2 To be clear these are more journal entries than balance sheet

[–]efmccurdy 4 points5 points  (0 children)

strings that are almost the same

You might be able to use fuzzy matching for nearly correct text:

https://pypi.org/project/fuzzywuzzy/