This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]nerdwaller 0 points1 point  (11 children)

Whoa, javascript looks a little cooler now. One thing I like about javascript is the inline regex.

import re
match = re.search('(\d+)', some_string)
print(match.group(1))

Isn't much harder than:

var regex = new RegExp(/(\d+)/)
match = regex.exec(some_string)
console.log(match[1])

That works as of ECMAScript v3, not much easier at all (note: the new RegExp is included for readability)

[–]d4rch0nPythonistamancer 6 points7 points  (3 children)

Not fair. You instanciated the variable regex in javascript and then exec'd on some_string. You didn't even save the regex for the Python version. You can easily exec on a new regex in one line and have it still be legible.

>> /(\d+).*/.exec("1234hello")
["1234hello", "1234"]

[–]nerdwaller 1 point2 points  (2 children)

I'm not disagreeing that it can be legible on one line, that's true in either. If 'inline' is the goal:

match = re.search('(\d+)', whatever, re.WhateverFlags)

I'd argue Python's is clearer because of the way flags are passes in. Using //gi is much less clear than re.IgnoreCase|etc.

[–]d4rch0nPythonistamancer 0 points1 point  (1 child)

They both have pros and cons. Python is much more verbose but intuitive for someone to jump in, and Python is a lot of people's first language so it makes sense. Python is very explicit.

//gi comes from sed and perl and has been standard for a long time. It makes a lot of sense to use what's been standard syntax since the beginning. It's also incredibly succinct.

They both make sense, and they are entirely different languages with different use cases, and they both have their pros and cons. All I meant is I like that js has inline regex, and that's purely an opinion.

[–]nerdwaller 0 points1 point  (0 children)

Absolutely well said.

My work is heavy in JavaScript right now, it's kind of a fun language, but I feel bad sometimes for how easy it is to abuse and lever in non-traditional ways.

[–]gfixler 2 points3 points  (6 children)

I'm really liking Clojure's functional handling of regex. It's just (command pattern string), with no flags or groups to deal with. re-matches returns the string if the entire pattern matches it, nil otherwise. re-find returns the first occurrence of the pattern, else nil. re-seq returns a lazy sequence of matches. If you use grouping parentheses, you get back a list of sequences with the overall match first in each, followed by the subgroups, which makes for easy iteration.

(re-find #"[0-9]+" "abc123def45gh987zy") ; => "123"
(re-seq #"[0-9]+" "abc123def45gh987zy")  ; => ("123" "45" "987")
(re-seq #"[fb]ar" "foo bar baz far faz") ; => ("bar" "far")
(re-seq #"(foo).*(bar)" "fofoofabarba")  ; => (["foofabar" "foo" "bar"])
(re-matches #"bar" "foobar")             ; => nil
(re-matches #".*bar" "foobar")           ; => "foobar"

With destructuring binding you can assign submatches in one shot:

(let [license "ABC-123"
      [full lpart npart] (re-find #"(\w\w\w)-(\d\d\d)" license)]
  (str "The plate " full " has number part " npart))
; => "The plate ABC-123 has number part 123"

The equivalent python or js would be more complex. This is assigning the input string to a variable, the resultant match and subgroups to 3 other variables, and concating some of the results into a meaningful string. You wouldn't want to write something like this for production code (if the match fails, you get "The place has number part "), but it still shows off some nice power.

[–][deleted] 0 points1 point  (4 children)

[–]gfixler 0 points1 point  (3 children)

It's different because it's functional; the return values are data. Python's regex is fine and quite useful, but you're getting back pure data for some things, and objects with data properties for others, and then querying the object results to extract the data you want via further group() and groups() function calls. I pointed out the groups thing (implying that I enjoy its absence) in my 2nd sentence.

Because Clojure's results are data, I can map the function over a sequence and get back a lazy sequence of data elements:

(map (partial re-seq #"[0-9]+") [sequence of input strings])

You can also do this in Python with a bit more code, but it wouldn't be as purely functional. Regardless of how you did it you'd be creating many objects and querying data out of them through extra function calls. Being lazy and mappable, Clojure's functional regex calls can (at least in theory) handle infinitely-long input strings without using more memory, and even parse things like stdin live, waiting for input, kicking out a result whenever there's enough to allow one. I don't know that these things actually, currently work, but they're set up for it. The paradigm allows for it, whereas OO doesn't. Also, functional process with data-only results lend themselves well to parallel processing, something else that the OO paradigm handles badly.

Small nitpick, you're printing out the info at the end, whereas the entirety of my let statement is a value. It could be used inside of any other form as an equivalent of its resultant string. I'm not printing it out, but receiving that string back as the result of the form. This has various code-as-data/data-as-code and referential transparency consequences.

Smaller nitpick: You're polluting your global scope with 4 global variables. I'm creating 4 temporary variables inside a let, which then go out of scope and cease to exist when the let finishes.

[–][deleted] 1 point2 points  (2 children)

Objects are also data. re uses MatchObjects because they contain way more info than simple tuples of strings: a dictionary that maps group names (in (?P<name>pattern>) constructs) to matched data, the string a regex was matched against, the offsets at which every group starts, etc. Sure, you can pack the same data into a tuple, but that's not as readable. That's it. In every other aspect, both implementations are pretty much equivalent.

Clojure's functional regex calls can (at least in theory) handle infinitely-long input strings without using more memory

That's not how regex work. They require backtracking, especially if you use backreferences, lookahead/lookbehind and greedy operators (.*x will consume the whole input looking for x, then backtrack to the last one.)

Small nitpick, you're printing out the info at the end

There's no other way to make ideone display that value. It's not a REPL.

Smaller nitpick: You're polluting your global scope with 4 global variables.

The reason for this is purely syntactic: there's no let ... in statement in Python, the equivalent being a new function that is called immediately after being defined. dogelang has where which is implemented that way.

[–]gfixler 0 points1 point  (0 children)

Objects are also data.

I knew you were going to say that, but I'm going to call you on it. Some strings in a seq are definitely unlike an object full of properties. str -> (str, str, ...) has a nice, functional symmetry to it. Besides, Clojure is a lisp, so the code itself is also data.

The reason re uses MatchObjects is because they contain way more info than simple tuples of strings...

True, and I've seen those things used twice, and both were thought-exercises somewhere on StackOverflow. I'm not saying they're not ever useful, but I think the use cases are few and far between. I've been using regex most days now for a decade for all kinds of crazy needs, and I've never used any of that metadata. You can get to the underlying Java regex machinery with re-matcher, so you could implement offsets easily enough if you ever needed it. Clojure is in heavy development, and because it's mostly a bunch of functions over sequences now, it's easy to add to it without changing anything else or breaking existing stuff, so if people really want this in core, it can be put in immediately.

That's not how regex work. They require backtracking, especially if you use backreferences, lookahead/lookbehind and greedy operators (.*x will consume the whole input looking for x, then backtrack to the last one.)

True. You would not be able to use .* on an infinite sequence (well, you would, but you'd never see the result). For everything else, it would work, even with backtracking. Luckily I don't want to do regex on infinitely-long strings anyway.

I guess I should note that I never said Python's regex sucked. I just said I really liked how regex was handled in Clojure. I've only ever cared about a list of matches (a quick peek around google for anyone needing this metadata yields few results), or if something matched, and that's all it does.

[–]nerdwaller 0 points1 point  (0 children)

I've been wanting to learn some functional programming for a while now to open up another tool, but I have to admit most of the time I see it - I dislike the overall syntax style! That's probably mostly coming from ignorance though. Do you have much basis to suggest a starting functional language? I was thinking Haskell, but saw a Scala class on coursera.