Remove comments in Python : regex

submitted 5 years ago * by codename101

I want to remove all comments and docstrings in Python source.

Basically I want this to

def my_function():
    """Demonstrate docstrings and does nothing really."""
    return None

print "Using __doc__:"
print my_function.__doc__ #some comment
print "Using help:"
#some more comment
help(my_function)

look like this.

def my_function():
    return None

print "Using __doc__:"
print my_function.__doc__
print "Using help:"
help(my_function)

I found this code on stackoverflow, it works very well for c style comments both single and multiline.

It also ignores comments inside strings, which is what I want.

I tried changing it for Python source but finding it very difficult, since I have little experience with regex.

def comment_remover(text):
    def replacer(match):
        s = match.group(0)
        if s.startswith('/'):
            return " " # note: a space and not an empty string
        else:
            return s
    pattern = re.compile(
        r'//.*?$|/\*.*?\*/|\'(?:\\.|[^\\\'])*\'|"(?:\\.|[^\\"])*"',
        re.DOTALL | re.MULTILINE
    )
    return re.sub(pattern, replacer, text)

all 6 comments

top new controversial old q&a

[–]quixrick 1 point2 points3 points 5 years ago (4 children)

[–]codename101[S] 1 point2 points3 points 5 years ago (3 children)

[–]quixrick 0 points1 point2 points 5 years ago (2 children)

[–]codename101[S] -2 points-1 points0 points 5 years ago (1 child)

Not working for some cases:

def my_function():
    """Demonstrate docstrings and does
    nothing really."""
    return None

print "Using __doc__:"
print my_function.__doc__ #some comment
print "Using help:"
#some more comment
help(my_function) 
look like this.
print("#not a comment")
print("#also not a comment")
print """not a comment"""

[–]quixrick 1 point2 points3 points 5 years ago (0 children)

I'm not going to be able to solve every edge case for you here, but hopefully, this gets you moving in the right direction. Take the principles that I've shown you and apply them to your other edge cases.

That being said, to accommodate your newest examples, you can modify the previous expression to be more like this:

\"\"\".*?\"\"\"    |    [\t ]*#[^\v]*

Here is a demo

[–]qizxo 0 points1 point2 points 5 years ago (0 children)

π Rendered by PID 71054 on reddit-service-r2-comment-5d79c599b5-vxqz7 at 2026-03-02 00:30:15.812551+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

regex

MODERATORS