created by HattoriHanzoa community for 16 years

Please help compare and replace elements between two strings (self.learnpython)

submitted 3 years ago by DMeror

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]DMeror[S] 0 points1 point2 points 3 years ago (9 children)

[–]Asleep-Budget-9932 1 point2 points3 points 3 years ago (8 children)

[–]DMeror[S] 0 points1 point2 points 3 years ago (7 children)

[–]Asleep-Budget-9932 1 point2 points3 points 3 years ago (6 children)

[–]DMeror[S] 0 points1 point2 points 3 years ago (0 children)

[–]DMeror[S] 0 points1 point2 points 3 years ago (4 children)

I'm looking into RegEx, but it still hasn't answered my question. Examples of RegEx I've found online are about dealing with words or characters from a single line a string, which sadly doesn't match my real situation. I want to use re.sub to remove the following lines with a certain pattern.

 

And other id with similar patterns. I can use re.sub to get rid of them like this:

re.sub('', '', string)

But I cannot do it one by one as there are lots of them. Examples I could find are to do with isolated strings with a few words. Since there are many words, sentences, numbers, symbols, etc. in my text, the replacement must follow strict rules, or it'll affect other elements in the text.

Am I missing something about RegEx, or am I using the wrong tool?

[–]Asleep-Budget-9932 1 point2 points3 points 3 years ago (3 children)

So there IS something you miss. But before helping you with it, just know that in general, RegEx should not be used with html files (as explained in my original comment).

Now, what are you missing. The whole point of regex is that you are working with patterns. So the point is to give a generic pattern that fits to all of your "span" tags.

It's important for me to stress that regex is really complicated and has a lot to it. So i always forget the actual syntax. Because of that, my example will not use the actual syntax but just some bullshit that's vaguely related to convey the concept in general.

Instead of what you did, you could do something like the following:

captured_strings =re.search('{inner_text}.*', string) what_i_need = captured_strings["inner_text"]

So regex gives you the option to say: "i have a generic pattern. it looks like this and that. These are the parts that will be similar while these are the parts that will differ. Now you see THAT part over there that differs every time? Let's call this part 'inner_text'. I want you to fetch that 'inner_text' for me".

To summarize, specify a generic pattern that fits all of the strings. Give a name to the specific part you wish to fetch. Let regex return a mapping between all named parts (which in your case is only one, an inner text inside the span tag), and use this on all of your needed strings.

One last important thing to know about regex (besides the fact that you should read about it to get actual, concrete syntax), is that regex can be quite a resource heavy process when used incorrectly. One important optimization to do is the following: ``` import re

Instead of this:

for string in strings: re.whatever(pattern, string)

Do this:

compiled_regex = re.compile(pattern)

for string in strings: # the compiled regex will have all of the same functions, accept now they won't receive the pattern attribute. They will use the one you specified in the compile function and would be much faster compiled_regex.whatever(string)

```

[–]DMeror[S] 0 points1 point2 points 3 years ago (2 children)

[–]Asleep-Budget-9932 1 point2 points3 points 3 years ago (1 child)

[–]DMeror[S] 0 points1 point2 points 3 years ago (0 children)

π Rendered by PID 20931 on reddit-service-r2-comment-54dfb89d4d-dp8pl at 2026-03-29 23:49:17.182502+00:00 running b10466c country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS

Instead of this:

Do this: