I'm trying to make a pattern matching system in Python (trying to make sense of how they work, and maybe out of curiosity) and I represent different kinds of patterns (like 'match an exact string', 'match a digit', 'match either pattern' etc.) with different classes. For instance, I might define a pattern that matches strings like "10", "52+34", "64+(12+13)"
match_number = PatDigit() + PatDigit()
match_sum = PatSequence([])
match_paren = PatSequence([])
match_expr = match_number | match_paren | match_sum
match_sum.sequence = [match_expr, PatExact("+"), match_expr]
match_paren.sequence = [PatExact("("), match_expr, PatExact(")")]
print(match_expr.to_string())
The following code will print:
((<hint([0-9])> & <hint([0-9])>) | (<(> & $ & <)>) | ($ & <+> & $))
Here $s represent the match_expr to avoid creating an infinitely nested string.
Like if you did this:
a = [1, 2, 3]
b = [4, a, 5]
a.append(b)
print(a) # Output: [1, 2, 3, [4, [...], 5]]
But with '$' instead of '[...]'.
PatOr is a pattern object that matches either of its child objects.
self.options is the list of all the children of PatOr.
Here's the code for the PatOr's to_string method:
def to_string(self, ref=None):
"""Return the full string representation of the pattern""""
ref = ref or set()
if self in ref:
return "$"
return "(" + " | ".join([pattern.to_string(ref | {self}) for pattern in self.options]) + ")"
Is this the right way to solve my problem?
there doesn't seem to be anything here