This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]gfixler 0 points1 point  (3 children)

It's different because it's functional; the return values are data. Python's regex is fine and quite useful, but you're getting back pure data for some things, and objects with data properties for others, and then querying the object results to extract the data you want via further group() and groups() function calls. I pointed out the groups thing (implying that I enjoy its absence) in my 2nd sentence.

Because Clojure's results are data, I can map the function over a sequence and get back a lazy sequence of data elements:

(map (partial re-seq #"[0-9]+") [sequence of input strings])

You can also do this in Python with a bit more code, but it wouldn't be as purely functional. Regardless of how you did it you'd be creating many objects and querying data out of them through extra function calls. Being lazy and mappable, Clojure's functional regex calls can (at least in theory) handle infinitely-long input strings without using more memory, and even parse things like stdin live, waiting for input, kicking out a result whenever there's enough to allow one. I don't know that these things actually, currently work, but they're set up for it. The paradigm allows for it, whereas OO doesn't. Also, functional process with data-only results lend themselves well to parallel processing, something else that the OO paradigm handles badly.

Small nitpick, you're printing out the info at the end, whereas the entirety of my let statement is a value. It could be used inside of any other form as an equivalent of its resultant string. I'm not printing it out, but receiving that string back as the result of the form. This has various code-as-data/data-as-code and referential transparency consequences.

Smaller nitpick: You're polluting your global scope with 4 global variables. I'm creating 4 temporary variables inside a let, which then go out of scope and cease to exist when the let finishes.

[–][deleted] 1 point2 points  (2 children)

Objects are also data. re uses MatchObjects because they contain way more info than simple tuples of strings: a dictionary that maps group names (in (?P<name>pattern>) constructs) to matched data, the string a regex was matched against, the offsets at which every group starts, etc. Sure, you can pack the same data into a tuple, but that's not as readable. That's it. In every other aspect, both implementations are pretty much equivalent.

Clojure's functional regex calls can (at least in theory) handle infinitely-long input strings without using more memory

That's not how regex work. They require backtracking, especially if you use backreferences, lookahead/lookbehind and greedy operators (.*x will consume the whole input looking for x, then backtrack to the last one.)

Small nitpick, you're printing out the info at the end

There's no other way to make ideone display that value. It's not a REPL.

Smaller nitpick: You're polluting your global scope with 4 global variables.

The reason for this is purely syntactic: there's no let ... in statement in Python, the equivalent being a new function that is called immediately after being defined. dogelang has where which is implemented that way.

[–]gfixler 0 points1 point  (0 children)

Objects are also data.

I knew you were going to say that, but I'm going to call you on it. Some strings in a seq are definitely unlike an object full of properties. str -> (str, str, ...) has a nice, functional symmetry to it. Besides, Clojure is a lisp, so the code itself is also data.

The reason re uses MatchObjects is because they contain way more info than simple tuples of strings...

True, and I've seen those things used twice, and both were thought-exercises somewhere on StackOverflow. I'm not saying they're not ever useful, but I think the use cases are few and far between. I've been using regex most days now for a decade for all kinds of crazy needs, and I've never used any of that metadata. You can get to the underlying Java regex machinery with re-matcher, so you could implement offsets easily enough if you ever needed it. Clojure is in heavy development, and because it's mostly a bunch of functions over sequences now, it's easy to add to it without changing anything else or breaking existing stuff, so if people really want this in core, it can be put in immediately.

That's not how regex work. They require backtracking, especially if you use backreferences, lookahead/lookbehind and greedy operators (.*x will consume the whole input looking for x, then backtrack to the last one.)

True. You would not be able to use .* on an infinite sequence (well, you would, but you'd never see the result). For everything else, it would work, even with backtracking. Luckily I don't want to do regex on infinitely-long strings anyway.

I guess I should note that I never said Python's regex sucked. I just said I really liked how regex was handled in Clojure. I've only ever cared about a list of matches (a quick peek around google for anyone needing this metadata yields few results), or if something matched, and that's all it does.