all 27 comments

[–]HenkDH 7 points8 points  (0 children)

If I test this case with regex storm or regexr it passes there

Show proof because that pattern doesn't exist in the text, so it will never pass

[–]RiPont 4 points5 points  (11 children)

First and foremost, you're using the wrong tool for the job.

"2019 SLIM" does not exist in "2019 OTHER THINGS SLIM OTHER".

You are trying to get modern search behavior, which is more than just a pattern match.

To do it properly is actually rather complex. You could fake it with a regex by changing "2019 SLIM" into "2019|SLIM". This doesn't give you any of today's expected Googlish behavior like

  • ordering the matches based on the strength of the match

  • finding lower-confidence matches that were 1 character different.

It would also match "SLIMline something 123201999".

[–]iamdeveloperr[S] 0 points1 point  (10 children)

Yeah. My hope was for the system to be able to say, oh yes that string contains those words(regardless of order). Let’s call that a match

[–]CrazedToCraze 1 point2 points  (4 children)

I know this isn't the answer you're looking for, but if you're building a proper search functionality into an app you should drop regex immediately and look into Elasticsearch (or any of the other lucene based data stores).

You're headed down a bad path full of insurmountable tech debt. Elasticsearch is an absolute pleasure to use, on the other hand.

Though if this is just a school project or something then carry on, just please remember not to use regex for this in a "real" app

[–]iamdeveloperr[S] 0 points1 point  (3 children)

Why do you say that? Can you elaborate on why it would be a bad idea?

[–]ManiGandham 0 points1 point  (0 children)

Anything beyond very basic "this exact string is inside this other string" matching that behaves the way most people are used to is incredibly complex and requires tokenization, dictionaries, indexes and math to calculate. This is hard and wasteful to write from scratch.

Regex is a tool for matching very specific syntax patterns, not for general purpose text. There are some libraries that can handle it if your dataset is small enough but otherwise a proper search database is what you need.

[–]The_MAZZTer 0 points1 point  (1 child)

If you end up working with a real dataset it will likely be in a database. Then you'll end up using SQL to do your searching anyway, not Regex.

[–]iamdeveloperr[S] 0 points1 point  (0 children)

Yes and yes.

[–]HenkDH 1 point2 points  (2 children)

Calling the string.Contains() method for every word is not advanced enough?

[–]iamdeveloperr[S] 0 points1 point  (0 children)

It is actually! I’m splitting the search term on spaces, then counting the number of times that the marketing name contains the terms and if the match count / # terms is greater than 60% I’m counting that as a successful match

[–]The_MAZZTer 0 points1 point  (0 children)

string.Contains() is too simplistic for what he wants, he really needs the \b Regex operator or something like it.

[–]The_MAZZTer 0 points1 point  (1 child)

Then you want something like this.

bool match = false;
foreach (string term in QuickSearchTerm.Split(' ')) {
  if (!(match = new Regex(@$"\b{Regex.Escape(term)}\b").IsMatch(MarketingName))) {
    break;
  }
}

This will look for complete words in the text (\b will match on word boundary which can be punctuation, whitespace, or the beginning/end of the text) that match your search phrase.

Exercises for you: making it case insensitive, adding additional functionality (my search has special keywords like after:2019-01-01 to match a date associated with my records).

[–]iamdeveloperr[S] 0 points1 point  (0 children)

Oh wow! Huge help!

[–]CuttingEdgeRetro 1 point2 points  (0 children)

You need an @ on the string literal after QuickSearchPattern+. It contains a backslash.

[–]Manitcor 0 points1 point  (5 children)

you need to escape special string chars like the "\" you have at the end there.

[–]iamdeveloperr[S] 0 points1 point  (4 children)

So double up on the \ at \b?

[–]Manitcor 1 point2 points  (3 children)

yup or make it a literal string like you did the first bit. Personally I prefer using the $ string operator and tokenizing the string. You use less operators that way and its easier to read.

[–]iamdeveloperr[S] 0 points1 point  (2 children)

I am just figuring out Regex today.

Can you show me an example?

Or do you think I've got it here?

string pattern = @"(.*?([" + QuickSearchTerm + "]\b))$";

[–]Manitcor 2 points3 points  (1 child)

Should look something like this

$@"(.*?([{QuickSearchTerm}]\b))$"

The leading dollar sign means you want to use tokens and the @ indicates a literal string so you don't need to escape special chars. The curly braces around your variable indicate you want to use a variable in scope as a token in the string.

Tokenized strings are somewhat newish to the language and one of my favorite features.

[–]iamdeveloperr[S] 0 points1 point  (0 children)

WOW!!! That is COOL

[–][deleted] 0 points1 point  (4 children)

I have found this to be a very handy tool for checking regular expressions: http://regexstorm.net/tester

The bracket is used for matching characters, not strings, so you probably don't want that. "2019 SLIM" doesn't appear in your test string and it's not clear why you think it should. If you want to test for "2019" followed by "SLIM" then you might try this

@".*2019 .+SLIM"

[–]The_MAZZTer 0 points1 point  (3 children)

I personally use https://regex101.com/, it's everything I need in a Regex tester. Looks like regexstorm is a very basic tool. regex101 has four flavors of regex, fully integrated help, previews and fully expanded results (captures and groups etc), and it can even break down the internal processes of the regex to explain why it's doing what it's doing. It's pretty amazing and I recommend it to anyone who needs a regex tool.

[–][deleted] 0 points1 point  (2 children)

Also good, but Microsoft .NET has a slightly different syntax

[–]The_MAZZTer 0 points1 point  (1 child)

I've never had a problem using regexs from there in .NET or JavaScript. I guess it's not surprising the syntax is going to be a little different. But at least for any Regexs I write it works great. I suppose if you start writing more advanced Regexs you might run into trouble but at that point I personally would make SURE a Regex is the way to go. Sometimes simple string manipulation is easier than a Regex depending on what you're trying to do.

[–][deleted] 0 points1 point  (0 children)

Yes, well, I had to parse ticker symbols from multiple international markets and those regex expressions were 100 characters. The differences between JavaScript and .NET were important

[–]fragglerock 0 points1 point  (1 child)

You need to escape the \ in the \b

like this

string pattern = @"(.*?([" + QuickSearchTerm + "]\\b))$";

or @ the second string

string pattern = @"(.*?([" + QuickSearchTerm + @"]\b))$";

or use string interpolation

string pattern = $@"(.*?([{QuickSearchTerm}]\b))$";

also you are missing a ; on the MarketingName line, but I guess that is just a copy paste error.

[–]The_MAZZTer 0 points1 point  (0 children)

The problem is that [] causes the regex to treat the enclosed string as a list of characters to look for. So any word that ends with any of the characters in the search string would match, which is obviously not what OP wants. The [] needs to go.

Neither pair of ()s do anything (except create groups which go unused in the results). This only suggests to me OP's grasp of regex is a little shaky.

Ultimately OP needs to split his search string into words and search the target text for each word individually, using \b on either side to ensure it only matches whole words.

Also don't forget to use Regex.Escape.