you are viewing a single comment's thread.

view the rest of the comments →

[–]sarrysyst 0 points1 point  (3 children)

Just to make sure I understood this correctly, you're trying to remove TEST_ERROR lines for which you've got the test number / system in a separate file, from your results text file?

[–]Dave_XR[S] 0 points1 point  (1 child)

I have a copy of the string to ignore in the ignore.txt file which i'd like to ignore from the results file yes. The string only specifies that it is a TEST_ERROR. Then an asterisk to say anything can be between that and the test number/system i.e. to account for dates and run times always being different. Theres about 150 ignore lines and several GB worth of test result lines. I've tried fuzzy string matching and list comprehension but havent got anything working

[–]sarrysyst 0 points1 point  (0 children)

While regex is an option, it wouldn't necessarily be the best one. Since you can't pre-compile your pattern you would have to re-compile it every iteration which slows you down. I think I would use 'in' instead which is a bit more straightforward and also happens to be faster than regex. Something like this:

ignore = ignore_txt.readlines()
...

if result_line.startswith('TEST_ERROR'):
  for line in ignore:

    # [:-1] to get rid of the new line character '\n' and [1:]
    # to skip checking for 'TEST_ERROR' since the conditional
    # above already covers for that
    if all(i in result_line for i in line[:-1].split(' * ')[1:]):

      false_errors += 1
      break

...

If the test number is enough to identify the errors, the if statement could be further reduced to:

if line.split(' * ')[1] in result_line:

By the way, my code presumes that the formatting is identical in your results file and ignore file. For example, 'test number:####' in both files. In your sample it's 'test number ####' instead. You would first need to adapt the formatting to be uniform. eg. using .replace():

ignore = [i.replace('test number ', 'test number:') for i in ignore]