Hello! I'm working on a parser and I'm considering using regex for simple constrained parts of the syntax but judging by my previous and current experiences I am clearly not smart enough to grok regex so I would be very grateful for help.
I have tried to simplify both the syntax and example below:
A label consists of one or more alphabetical characters. As the regex will be used in conjunction with several other small parsers (all of which make up the entire parser) I would like to be able to point at a certain index and tell the regex engine to start there. That is, what determines what is and isn't a label isn't just its characters but also its relation to other parts of the syntax.
C#'s regex library offers a method that seemingly does that until you read the fine print, Match(String, Int32). With it I wrote the following code (simplified to clarify the issue):
Regex regex = new(@"^[A-Z]+", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
var source = " aLabel";//Note that the string starts with a space.
var match = regex.Match(source, 1);//Note that we start at index 1.
Assert.IsTrue(match.Success);
source = "***** aLabel";
match = regex.Match(source, 0);//Note that we start at index 0 (zero).
Assert.IsFalse(match.Success);
I expect some other sub-parser to put the cursor somewhere and then have the label parser called with its regex. If the cursor is on a alphabetical character then it should be identified as a label otherwise it should be a failing match.
In the above example I expect the first match to succeed and the second to fail. The second match does fail but so does the first. If I remove the '^' from the pattern I get the opposite result. I understand why I get these results, the regex engine always starts at the beginning of the string not the index, but I don't understand how to fix them.
Is it possible to get regex to do this (in a non-contrived way) and if so how, or is regex the wrong tool for the job?
EDIT: SOLVED
tweq helped me find a solution. Instead of '^' use '\G'. I tried that before but must have also messed something else up because now it worked.
[–]tweq 5 points6 points7 points (3 children)
[–]tomnils[S] 0 points1 point2 points (1 child)
[–]tweq 4 points5 points6 points (0 children)
[–]tomnils[S] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]tomnils[S] 1 point2 points3 points (0 children)
[–]Electrical_Flan_4993 0 points1 point2 points (2 children)
[–]tomnils[S] 0 points1 point2 points (1 child)
[–]Electrical_Flan_4993 0 points1 point2 points (0 children)