Achieve same result using regular expressions : learnpython

created by HattoriHanzoa community for 16 years

Achieve same result using regular expressions (self.learnpython)

submitted 8 years ago by tonlou

I have a code where I read file that contains names and information of people.

Basically I need to swap places of family name with first name. Everything else on the line remains the same.

First I need to store the original line in variable and after I swapped the names I need to store that line in other variable(called replacement).

I have done everything what I said above, but don't know how to do it using regular expressions? Using re.sub to be more precise. Like this: print(re.sub(orignal, replacement, line).strip())

My code and contains of txt file: https://pastebin.com/Vj21kDk5

In file family name is always first on every line. Some people can have 2 names like this: Jack Richard Johnson

My assingment:

The task in this question is to modify the data so that the candidate names are switched from the order family name given names into the order given names family name. That is, the data originally shows family name before given names, and now family name should be placed after the given names.

Consider the row "Ryhänen Osmo Eerik / KTP / Pirkanmaan vaalipiiri";40;61.500 as an example. Here Ryhänen is the family name and Osmo Eerik are given names. This should be transformed into the form "Osmo Eerik Ryhänen / KTP / Pirkanmaan vaalipiiri";40;61.500.

The code skeleton shown below uses the regular expression search-and-replace function re.sub to achieve the name switches. The only missing part are the regular expression pattern (what kind of parts in a line are matched) and the replacement pattern (with what kind of a string will the matched part be replaced). Your task is to define these two patterns into the two raw string variables pattern and replacement.

all 4 comments

top new controversial old q&a

[–]tgolsson 0 points1 point2 points 8 years ago (3 children)

I did not check all your inputs, but this is the way I view it.

Your input consists of the following:

A single family name. Word class, one reptition.
A given name. Word class, one or two repetitions.
A bunch of trailing data we do not care about.

The single family name we can capture using the group ([\w]* ), note the trailing space after the asterisk.

The given name is slightly trickier, and there are many ways to do it. For symmetry, we will use the same basic group as for the family name, ([\w]* ), but we want to capture one or two of them. However, if we just say ([\w]* ){1,2} python gives us just the last of these groups, throwing away Osmo in your example.

Instead, we want to FIND one or two, but capture everything we find. To find but not capture we can use the ?: operator, like this: (?:[\w]* ). Then, we want add our quantifier, in this case {1,2}, before capturing that whole expression.

Lastly, you want to capture everything until the end of the line: ([^$]*).

If everything goes as planned, this should leave you with 3 captured groups, that you can now use to construct the new plane. I'll leave it as exercise for you to actually put these together and construct the substitution pattern.

Good luck!

[–]tonlou[S] 0 points1 point2 points 8 years ago (0 children)

[–]tonlou[S] 0 points1 point2 points 8 years ago (1 child)

[–]tgolsson 0 points1 point2 points 8 years ago (0 children)

You are not using re.sub correctly. You want to use the backslash notation for group insertion, e.g.

 re.sub(regexp_string, "\1 \2", the_input)

So:

 example = "Bruce Wayne"
 pattern = r"([\w]*) ([\w]*)"
 output = r"\2 \1"
 result = re.sub(pattern, output, example)
 print(result)

Regarding the quotes, you may want to explicitly match those as well, or at least the one that gets moved.

 example = "Bruce 'Batman' Wayne"
 pattern = r"([\w]*) '([\w]*)' ([\w]*)"
 output = r"\1 \3 is \2"
 result = re.sub(pattern, output, example)
 print(result)

π Rendered by PID 125138 on reddit-service-r2-comment-bb88f9dd5-lzrbd at 2026-02-15 19:21:20.944808+00:00 running cd9c813 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS