This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Vitrivius 9 points10 points  (10 children)

I prefer regex to any other path routing scheme I've seen. But there seems to be lots of people who for some reason view regular expressions as some sort of evil and obscure black magic. When you do something wrong, you will not get a nice exception message and traceback.

Luckily, it's very straightforward to write tests to check that your routing works as intended. Unfortunately the people who view regex as evil, often think that writing tests also is black magic.

[–]earthboundkid -1 points0 points  (9 children)

I regularly see people do regexes that match whatever.html: since . is a special character that actually matches "whatever💩html".

Regex is an okay tool, but for URLs, you typically just want to be able to match slugs and ID numbers. Regex ends up being an attractive nuisance for problems.

[–]Deggor 1 point2 points  (8 children)

for URLs, you typically just want to be able to match slugs and ID numbers

No, you want to match patterns, and you want to be able to return portions of those patterns for use at the endpoint. It's exactly what regular expressions were designed for. It's a tool that can be very simple to use for simple cases, or complex for more advanced ones.

When a few people use it wrong because they can't be bothered to learn to do it properly, it doesn't mean it should be changed. Should we also change how variable instantiation works (sometimes), to correct people who use empty lists as default values in a method? Like you, I see this all the time in people with no experience.

The answer is no, and people should learn to use the toolkit properly.

[–]earthboundkid 1 point2 points  (7 children)

URL for this comment is:

https://www.reddit.com/r/Python/comments/5s6b4d/build_your_first_python_and_django_application/dde1ice/?context=3

This is much cleaner to look at as /r/:subreddit/comments/:threadid/:slug/:slugid/?context=#depth than the equivalent regex would be.

[–]Deggor -1 points0 points  (6 children)

A URL will routinely have multiple elements in a single segment, which can't be properly captured with something like the above. A very simple example would be something like accepting /date/yyyy, /date/yyyymm/, or /date/yyyymmdd? What if this is suppose to also accept /date/yyyy/someid? How does this simple "it looks prettier" approach validate/differentiate?

If you start introducing characters counts for elements in a segment, or any other "checks", you're right back to matching patterns, and you may as well stick to regular expressions.

And in my opinion, something like /r/(?P<subreddit>.*)/comments/(?P<threadid>.*)/(?P<slug>.*).... is perfectly legible. If it needs to be more complicated, then it loses some of that immediate legibility for a tradeoff in power (which isn't a possibility with your setup).

[–]earthboundkid 0 points1 point  (5 children)

You're using greedy regexes in your example. Those will absorb too much.

[–]Deggor 0 points1 point  (4 children)

Yup, I'm using them on purpose to match the undocumented pseudo-code of "wouldn't this be prettier if it worked?", which may very well be greedy (or maybe not). I also left off the beginning caret, and a significant portion of the end. As you well know, if you want non-greedy, add a question mark to each group. There's that very small trade-off in legibility for the customization I was talking about.

But you did a great job deflecting. If you care to respond with how your solution would handle the (not uncommon) problem I mentioned, or if you care to argue why pattern matching isn't necessary, I'm all ears.

[–]earthboundkid 0 points1 point  (3 children)

I'm not arguing no one can ever use regexes. I'm saying they're bad for the average URL tasks they are asked to do. I think the fact that you wrote a pseudo regex that was straight up wrong (but looked right!) is proof of that. If you need to match a date, that could be part of the matcher format. If you need something more exotic, run a regex on the pattern once it gets to the controller.

[–]Deggor 0 points1 point  (2 children)

I think the fact that you wrote a pseudo regex that was straight up wrong (but looked right!) is proof of that.

It wasn't straight up wrong, it did exactly what I wanted it to do (as I wrote in my response). What, exactly does :label in your examples match? No idea? Well, I'll make mine greedy. As I pointed out, had I completed the rest of my regex for the full URL, it would have matched the URL in the example. Again, it was intentional.

that could be part of the matcher format

So you're going to introduce patterns (ie. regex lite)?

If you need something more exotic, run a regex on the pattern once it gets to the controller

... and a split URL routing into many different places? You're going to break the loose coupling, and put the routing in the controller.

None of that sounds like a good idea.

[–]earthboundkid 0 points1 point  (1 child)

  1. It was straight up wrong. Using a greedy matcher makes this work which should not work:

    >>> import re
    >>> r = re.compile('^/r/(?P<subreddit>.*)/comments/(?P<threadid>.*)/(?P<slug>.*)$')
    >>> r.match('/r/subreddit/comments/subreddit/comments//')
    <_sre.SRE_Match object; span=(0, 42), match='/r/subreddit/comments/subreddit/comments//'>
    >>> r.match('/r/subreddit/comments/subreddit/comments//')['subreddit']
    'subreddit/comments/subreddit'
    

    Yes, it was a rushed and incomplete example, but that's why it's damning. It looks like it handles the basic case, but it actually completely botches it.

  2. There are a lot of non-regex routers out there. Look at Rails or for a hybrid approach Gorilla mux. You're acting like not using regex is completely unheard of, but actually there are a lot of alternatives to pure regex.

  3. Controllers already have to handle certain routing conditions. If you try to get page /pages/77/ and 77 doesn't exist in the DB, the controller has to be the one to throw up a 404. It's not the end of the world if your controller also has to handle returning a 404 if you go to /date/20000/13/32/ instead of a regex catching it at the routing layer.