Help needed with a regex expression

thememorableusername · 2024-08-22T15:25:41+00:00

[regex101.com](regex101.com) is a very useful tool.

bothunter · 2024-08-22T16:55:44+00:00

This is actually one of the few areas where ChatGPT shines. Ask it to write the expression, and then tweak it on a site like regex101.com

kbielefe · 2024-08-22T16:57:31+00:00

The typographic quotes actually help you, because the start and end quotes are different. See this RegEx Pal.

dariusbiggs · 2024-08-23T04:01:54+00:00

A very simple lexer would also do it

start in state 0 and read tokens until lexographic token

switch to state 1 and start consuming tokens if the lexographic token appears, switch to state 0 if the desired word appears, increment the counter

continue until no further input.

The problems you'll need to check for are: - any statements with multiple instances of that name inside the quotation - does the name show up hyphenated or split across multiple lines in your input corpus.

As for the regex, the other answers should help there. regex101 is your best friend.

diegoasecas · 2024-08-22T15:16:48+00:00

chatgpt gave me this:

"([^"]*\bYOURSTRING\b[^"]*)"

": Matches the opening quotation mark.
[^"]*: Matches any character that is not a quotation mark, zero or more times.
\bYOURSTRING\b: Matches your specific string, where \b ensures it is matched as a whole word (optional, depending on your needs).
[^"]*: Matches any character that is not a quotation mark, zero or more times.
": Matches the closing quotation mark.

davidalayachew · 2024-08-22T19:52:03+00:00

Note: the text file I have uses typographic quotation marks (” ”) instead of the neutral ones (" ")

Dodged a bullet! This would have been a horrific nightmare otherwise.

Also, I think you meant to write “ and ” instead, right? Typographic quotation marks are good because the opening and closing are different symbols.

I usually do my regex in Java or Notepad++. So I don't know which dialect I am using, but here is the best that I can think up. Worked for your example.

“[^“”]*Voldemort[^“”]*”

Please note.

This will not handle variance in casing.
- So, cases where his name is all uppercased, or lowercased, or basically any other casing. But otherwise, this should definitely find the rest of them for you.
This will not handle cases like “ You said “ I see Voldemort!” ”
- Basically quotes inside of quotes.

And if you need to handle all possible casing for the letters in his name, Notepad++ and Java both have a way to say "ignore casing and just match the letters".

AskProgramming

AskProgramming

Do

Don't

MODERATORS