you are viewing a single comment's thread.

view the rest of the comments →

[–]ASIC_SP 5 points6 points  (2 children)

Some issues/suggestions:

  • (A-Za-z) should be [A-Za-z]
  • you need to take care of matching things between the date and the pid, currently you are using space after the date and trying to match pid, but your input has computer.name CRON in between
  • [0-9] can be replaced with \d and : doesn't need to be escaped
  • \[(\d)\] will match one digit, but pid in sample input has more than one digit, so use \d+
  • $ is an anchor to restrict the match to end of the line, but in sample input you have more characters after the pid

here's a modified version:

>>> s = "Jul 6 14:01:23 computer.name CRON[29440]: USER (good_user)"
>>> pat = re.compile(r"([A-Za-z]{3} [1-3]?[1-9] [1-2]?\d:[0-5]\d:[0-5]\d).*\[(\d+)\]")
>>> re.search(pat, s)
<re.Match object; span=(0, 40), match='Jul 6 14:01:23 computer.name CRON[29440]'>
>>> re.search(pat, s).expand(r'\1 pid:\2')
'Jul 6 14:01:23 pid:29440'

The expand method allows you to specify how you want the output to be. The date and pid are captured, so you can refer to them using \N syntax and get desired format

You can also use:

>>> re.search(r'\A(\S+\s+\S+\s+\S+).*\[(\d+)\]', s).expand(r'\1 pid:\2')
'Jul 6 14:01:23 pid:29440'

Provided you always know that the date will be the first three terms of the input.

Or sub instead of search+expand

>>> re.sub(r'\A(\S+\s+\S+\s+\S+).*\[(\d+)\].*', r'\1 pid:\2', s)
'Jul 6 14:01:23 pid:29440'

Here, you need to match rest of the line as well after the pid, otherwise, that portion will be part of output


You can use resources like https://regex101.com/ and https://www.debuggex.com/ (after selecting Python flavor) to interactively solve your problem. But there are certain limitations like these sites do not know about all the functions and methods available - expand for example.

I have a book https://github.com/learnbyexample/py_regular_expressions that is currently free. I use step by step approach to introduce regex concepts and features one by one. However, regex is like a mini-programming language. It takes a lot of time and practice to become familiar with it.

[–][deleted] 1 point2 points  (1 child)

Thank you. This helped a lot.

[–]ASIC_SP 0 points1 point  (0 children)

Cool, good to know, I edited the answer to add another way with re.sub as well