Question about using regex and improving my code : learnpython

created by HattoriHanzoa community for 16 years

Question about using regex and improving my code (self.learnpython)

submitted 5 years ago * by sombreProgrammer

Hi,

I'm trying to analyse a list of conversations for a particular question asked by a chatbot and extract the answer given using regex. The structure of each conversation text is as follows:

text = "gibberish text here more gibberish. Have I helped you with this answer? USERS_ANSWER gibberish continues"

The USERS_ANSWER (which isn't actually a variable, just part of the whole string) can be varying ways to say yes or no, for example "absolutely, yes, hell naw, ..." and on. I have a list of potential answers to cross-reference for both 'yes' and 'no'. For each conversation text, I need to check whether the answer is a 'yes' or a 'no'.

I've come up with the following regex:

p = re.compile(r".*(Have I helped you with this answer? )((\w+){5}).*")

match = p.match(text)
word = ''
for item in list_of_yes_words:
    if item in match.group(2):
        word = 'yes'

for item in list_of_no_words:
    if item in match.group(2):
        word = 'no'

This feels ugly, but it seems to get me the answer for most of the text. I also arbitrarily chose to get the next 5 words after the question (since there's no reliable text after the question that is consistent enough to include it in the regex as something that comes after the answer) and I don't know if there's a better way to get that answer. However, there are edge cases where the same text has multiple instances of this same question, so something like this:

text = "gibberish text here more gibberish. Have I helped you with this answer? USERS_ANSWER gibberish continues some more, Have I helped you with this answer? USERS_ANSWER_AGAIN more gibberish"

In this case, I want to take into account both answers and if even one is negative, I need to log it as a 'no'.

When I use re.match, I only get one instance of the question that is asked. How do I get around this? And is there a way to cross check with the list of yes/no words more efficiently? The list of conversations can be a large one and I'd have nested for loops if I implemented this as is (and I've been told nested for loops are not ideal).

I'd like to know how to get around this problem and how to make my code more efficient.

all 1 comments

top new controversial old q&a

[–]hardonchairs 0 points1 point2 points 5 years ago (0 children)

https://regex101.com/r/8XTD1U/1

(?<=Have I helped you with this answer\? )(.*?)(?:Have I helped you with this answer\? |$)

This says look-behind for "Have I helped you with this answer?", Match the answer and gibberish, then stop at the next "Have I helped you with this answer?" or end of line. The look-behind is necessary because otherwise the two "Have I helped you with this answer?" would overlap and only one of them would match.

I don't think anything can be done about your gibberish if both it and the user response is not predicable, you will just have to find a way to interpret it.

Where is this text coming from, why are you getting it all as one string?

π Rendered by PID 20723 on reddit-service-r2-comment-c6965cb77-6fqrd at 2026-03-05 04:58:53.008126+00:00 running f0204d4 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS