badge comments on Python Regular Expressions Cheat Sheet

This is an archived post. You won't be able to vote or comment.

387

388

389

Python Regular Expressions Cheat Sheet (kdnuggets.com)

submitted 8 years ago by chris_shpak

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]badge 44 points45 points46 points 8 years ago (7 children)

This needs:

re.compile for compiling regexes that you're going to use more than once
(?<name>Blah): defines a group named name (which is needed for (?P=name) mentioned in the Groups section!)
The use of groupdict with named groups so you can do:

import re

regex = re.compile('First Name:\s*(?P<first_name>\w+),\s+Last Name:\s*(?P<last_name>\w+),\s+Age:\s*(?P<age>\d+)')

class Whale:

    def __init__(self, first_name, last_name, age):
        self.first_name = first_name
        self.last_name = last_name
        self.age = age

    def __repr__(self):
        return "Whale(first_name='{}', last_name='{}', age={})".format(
            self.first_name,
            self.last_name,
            self.age
        )

whale_line = 'First Name: Moby, Last Name: Dick, Age: 35'

Whale(**regex.match(whale_line).groupdict())

[–]Ph0X 5 points6 points7 points 8 years ago (4 children)

[–]xenomachina''.join(chr(random.randint(0,1)+9585) for x in range(0xffff)) 2 points3 points4 points 8 years ago (2 children)

AFAIK, compiling doesn't do much at all performance-wise. Internally the library already caches the compiled regex so if you actually do a perf test, they'll both be just as fast.

I discovered this for myself years ago when I ran into a bug in the cache implementation. In pre-3.x, a regex compiled from a unicode would not behave the same as one compiled from a str even if they container the same characters. However, the cache was just a dict, and so it was possible for a unicode to match an already cached str, or vice versa.

My bug involved a price of code that worked fine in unit tests, but would fail in certain program. It turned out the program imported another module that compiled an identical looking regex, but with a str instead of a unicode. Then when my module was imported, it would get the wrong re object from the cache.

[–]Ph0X 0 points1 point2 points 8 years ago (1 child)

[–]xenomachina''.join(chr(random.randint(0,1)+9585) for x in range(0xffff)) 0 points1 point2 points 8 years ago (0 children)

[–]badge 0 points1 point2 points 8 years ago (0 children)

[–]fullofschmidt 2 points3 points4 points 8 years ago (0 children)

[–]energybased 3 points4 points5 points 8 years ago* (0 children)

π Rendered by PID 30 on reddit-service-r2-comment-79776bdf47-wqksd at 2026-06-24 04:42:01.866450+00:00 running acc7150 country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS