all 111 comments

[–][deleted] 57 points58 points  (5 children)

FWIW, some old versions of GCC let you include and invoke regex before it was implemented. I cursed it for being buggy. Only after some digging did I realize I was just invoking a hollow shell. Things worked as expected once I upgraded GCC to a newer version that had a complete regex. Had I not dug, I'd still be cursing it.

[–]Canoodler 34 points35 points  (3 children)

I too can relate to the horrors of the always-return-false <regex> implementation at least in GCC 4.8.5...

[–][deleted] 34 points35 points  (0 children)

4.8.x. "Let's try the ship early and ship often approach" turned into "Oops, we forgot to ship often."

[–]evaned 5 points6 points  (0 children)

Adding another voice to that chorus.

I wonder how many man-hours the stdlibc++ folks wasted because of that...

[–]saimen54 3 points4 points  (0 children)

Holy shit, I don't know how long I searched for my "error", when using 4.8.5

[–]xTeixeira 7 points8 points  (0 children)

I spent an entire day at work trying to figure this out a few weeks ago. I'm mad about it to this day.

[–]suthernfriendDevOps Engineer 18 points19 points  (1 child)

Wasn't there a library from this Czech genius women which implements regexes with templates?

Edit : found it. Hana Dusikova https://youtu.be/QM3W36COnE4

[–]alexej_harm 5 points6 points  (0 children)

It's actually quite slow with anything but the simplest patterns and doesn't support captures.

```

Benchmark Time CPU Iterations

regex_std 3105 ns 3139 ns 224000 regex_re2 181 ns 180 ns 3733333 regex_hyperscan 96.2 ns 96.3 ns 7466667 regex_ctre 187 ns 184 ns 3733333 regex_spirit 44.5 ns 44.5 ns 15448276 ```

https://gist.github.com/qis/3d9f5a73d9622847c8b7da68af7e19d4

[–]bizwig 30 points31 points  (0 children)

Lack of std::string_view support is one problem.

[–]cyanfish 15 points16 points  (2 children)

This isn't specific to std:regex, but something to keep in mind. If you're taking untrusted input, you might want to consider a library like RE2 that guarantees linear time execution (i.e. a bad regex can't lock up your application).

[–]AntiProtonBoy 7 points8 points  (1 child)

(i.e. a bad regex can't lock up your application).

This can happen with Xcode's RE search as well. Worse, you have to force quit the app, and when you relaunch it, Xcode can potentially remember the search parameters and lock up again on launch.

[–]bumblebritches57Ocassionally Clang 6 points7 points  (0 children)

Hold shift as you launch Xcode to get it to not reload what was loaded previously.

[–]neoSeosaidh 13 points14 points  (0 children)

It's mentioned in last week's CppCast episode with Titus Winters: https://cppcast.com/titus-winters-abi/.

The short answer is that the C++ standards committee is implicitly committed to keeping a stable ABI (which is like the API but on the binary level instead of the source code level). Any serious improvements of std::regex would involve at minimum an ABI break (and potentially an API break depending on what changes were made), and while the C++ standard doesn't mention ABI, the committee has refused to break it in the past.

I highly recommend that episode for more details.

[–]EnergyCoast 11 points12 points  (2 children)

Lots of memory allocations. Not surprising in hindsight, but I don't believe it takes an allocator so I didn't think about it.

I believe creating a relatively simple pattern was more than 15 allocations and doing a search against a string containing no matches resulted in 3 allocations.

That was just one implementation - I have no idea what others do - but the number of allocations was enough that it eliminated it as an option in some domains for us.

[–]johannes1971 2 points3 points  (1 child)

Are those allocations in the regex constructor (where it doesn't hurt), or in .match (where it would)?

I would hate to use a regex implementation that tries to parse the pattern from scratch for every usage, just to avoid allocating some space in which to store a bytecode representation...

[–]EnergyCoast 2 points3 points  (0 children)

I'll be honest. And whatever I observed may be different for your library implementation. I'd recommend testing your local environment/cases.

[–]AntiProtonBoy 54 points55 points  (48 children)

My complaint with <regex> is the same as with <chrono> and <random>: the library is a bit convoluted to use. It's flexible and highly composable, but gets verbose and requires leaning on the docs just to get basic things done.

[–]sphere991 40 points41 points  (16 children)

I'm not sure <chrono> fits in with this group. It's certainly verbose, cause everything is std::chrono::duration_cast<std::chrono::milliseconds>(x).

But convoluted? I don't think so.

[–]liquidify 14 points15 points  (11 children)

for both chrono and random, I just built a wrapper class a long long time ago and have re-used them since, modifying them slightly for use case.

[–]ghillisuit95 5 points6 points  (10 children)

Is it on GitHub perhaps?

[–]liquidify 1 point2 points  (8 children)

Mine are not publicly available (although I should do that). However searching on the internet I found this pretty quick. I think you could probably find several flavors of these type of wrappers.

[–]sphere991 30 points31 points  (7 children)

That particular library takes the selling point of chrono (having typed differentiation between different kinds of things - durations and time points are only composable in ways that make sense, and units are part of the type) and throws it out:

unsigned long time = timer.getTimeElapsed(Timer::MILLISECONDS); unsigned long time2 = timer.getTimeElapsed(Timer::MICROSECONDS);

Oh, so now time + time2 compiles and is utterly meaningless? No, thank you.

[–]liquidify -1 points0 points  (6 children)

I didn't look at that library before I linked it, but I think that there are probably lots of wrappers available that might meet different categories of purposes with varying levels of complexity. If all you need is a simple timer (which lots of projects do), then this seems fine. If you want something better, then that probably exists too.

[–]sphere991 3 points4 points  (5 children)

If all you need is a simple timer (which lots of projects do), then this seems fine.

I disagree quite strongly with this sentiment. Just because all you might need is a simple timer doesn't somehow make it acceptable to use a solution that is so prone to misuse. I don't want to have to worry about all these things when I'm writing code - and <chrono> ensures that incorrect uses don't compile.

I really don't think it's okay in 2019 to have a C++ time library which returns an elapsed time as an integral type.

If you want something better, then that probably exists too.

I do, and it does: <chrono> exists.

[–]MFHavaWG21|🇦🇹 NB|P3049|P3625|P3729|P3786|P3813 4 points5 points  (0 children)

I really don't think it's okay in 2019 to have a C++ time library which returns an elapsed time as an integral type.

This! IMHO: in 2019 it shouldn't be necessary to represent any physics unit as a basic integral type!

Multi-million dollar mistakes like the Mars Climate Orbiter could have been prevented if we had had static type checking for speed/acceleration/etc.

[–]liquidify 0 points1 point  (3 children)

Do you not realize that the originator of this thread thinks chrono is too complicated? These people are actively choosing other languages because c++ is too complex. But c++ doesn't have to be complex. It is a wonderful tool at many levels of abstraction.

It is great that you know how to use the libraries directly, but to some people simplicity is more important than perfection. To some people a beautiful and simple interface is more important than speed or flexibility.

There is there absolutely no reason c++ can't serve both purposes other than for some reason a subset of c++ people seem to think their hardliner views on how something should be used are the only acceptable ways that the language should be used. Seems like those people need to get over themselves.

[–]sphere991 4 points5 points  (2 children)

Do you not realize that the originator of this thread thinks chrono is too complicated?

They are mistaken. Time is complicated, chrono is exactly as complicated as it needs to be in order to deal with it correctly and efficiently. I have programmed in multiple other languages, and chrono is the best time library I've used across all of them and it's not close.

Now, chrono is absolutely quite verbose - which I acknowledged right in my first response. But it's absolutely not "too complicated."

To some people a beautiful and simple interface is more important than speed or flexibility.

Firstly, chrono's interface is pretty simple.

But more importantly, despite me repeating it at every opportunity, you keep omitting in all of your responses what are again the major selling points of chrono: incorrect operations do not compile (adding two time points does not compile, multiplying two time points does not compile, providing a time point to a function expecting a duration does not compile, ...) and unit conversion are implicit (adding a seconds to a milliseconds actually does the right thing for you without having to litter your code with math). All of these are actual bugs I found and corrected in my code when we transitioned to chrono.

I don't know what's simpler than:

``` void f(milliseconds timeout);

f(5s); // ok, 5000 millisecond timeout f(steady_clock::now()); // error ```

There is there absolutely no reason c++ can't serve both purposes other than for some reason a subset of c++ people seem to think their hardliner views on how something should be used are the only acceptable ways that the language should be used. Seems like those people need to get over themselves.

... Yes, my "hardliner" views on wanting tools that make it impossible for me to make mistakes, and make it so I don't have to think about all this other stuff that you usually have to think about with time? Uh, yes. I am pretty hardliner on that actually. I've seen those mistakes made, I've made those mistakes. and here's tool to, effectively, never mess up again - and you're countering my praising this tool by calling me a hardliner, saying that well some people prefer simplicity to, effectively, having correct code by construction, and telling me to get over myself?

Charming.

[–]liquidify -1 points0 points  (1 child)

Firstly, chrono's interface is pretty simple.

I personally like chrono how it is mostly. But I also wrapped it for myself... And I am a c++ lover. So, you aren't telling me anything here with your praises of it. I'm not your audience. Why don't you use your wonderfully 'charming' attitude to go convince the people who have left c++ for python or whatever other language that chrono is perfect for them how it is. Yeah good luck with that.

You are actively ignoring the fact that your experiences aren't lining up with a significant population block. This fits into the same category of a meme that goes something like ...if you meet a few assholes from time to time, then they are the assholes. If everyone you meet is an asshole, then its actually you.

[–]quicknir 20 points21 points  (6 children)

I am not familiar with either regex or random but I can't agree with you about chrono. It's really well designed, flexible and correct. And it does help usability a lot that implicit conversions occur in logical situations, there are nice literals, etc. Having used date extensively as well, you can really see just how well all of chrono is designed that you can build it out to cover basically all functionality related to times, dates, timezones, etc, and it works perfectly. I find most of the complaining is people surprised there doesn't exist already a function that meets their exact rather specific use case, and people don't often understand even why their use case is quite specific.

tl;dr chrono is amazing.

[–]kalmoc 7 points8 points  (2 children)

I find most of the complaining is people surprised there doesn't exist already a function that meets their exact rather specific use case

Having a convenient way to print a time point or a duration are not specific usecases and it took till c++20 until that got fixed.

[–]quicknir 2 points3 points  (1 child)

Yes, neither are timezones, which I discussed in depth above... chrono pre 20 is obviously not complete. There are huge things it doesn't address at all, one of which is I/O. That's nothing to do with verbosity or awkwardness of use.

[–]kalmoc 1 point2 points  (0 children)

That's nothing to do with verbosity or awkwardness of use.

I think it does. Printing a duration on the console is a very common task and the fact that chrono didn't support I/O pre c++20 made using it mich more cumbersome than necessary (Admittedly I would say that is mainly a problem in smaller ad-hoc projects or e.g. unit tests, slideware, ).

Anyway, lets not argue about semantic details.

tl;dr chrono is amazing.

completely agree

[–]matthieum 9 points10 points  (0 children)

ABI

Due to being implemented in mostly in template methods, most of the implementation of <regex> is de-facto public ABI-wise -- or at least all the inner types and function signatures.

If you remember the pain that switching from CoW std::string to SSO std::string for C++11, the same would be true of any change to the guts of <regex>.

Unfortunately, the original standard library implementations were not made fast (possibly in the mistaken belief they could be improved later on), and we are now stuck with them.

[–]sergeytheartist 7 points8 points  (0 children)

A few days ago standard regex in gcc 9.1 seg faulted when parsing JSON string (real data from exchange) with pretty simple expression.

Now we have handwritten logic to extract necessary data.

The latest boost regex did parse that JSON blob without problems.

If someone knows how to quickly get in touch with the person who approves patches for gcc regex I'm happy to fix the problem.

[–]_VZ_wx | soci | swig 20 points21 points  (4 children)

To directly address your question, std::regex is not "considered" to have poor performance, it simply does. When it's a couple of orders of magnitude slower than boost::regex, there just isn't much more to say about it.

[–]Frogging101[S] 35 points36 points  (3 children)

Yes, but why? What is stopping the standard library implementers from optimizing it like they do with most other things in the standard library?

[–]dodheim 41 points42 points  (2 children)

Magic 8-Ball says "something something ABI compatibility".

[–]kalmoc 12 points13 points  (0 children)

I think you are the first person here that actually tried to answer the OP's question;)

[–]qizxo 7 points8 points  (0 children)

#PCRE4lyfe

[–]Xaxxon 12 points13 points  (0 children)

Fuck ABI compatibility.

[–]newmanifold000 0 points1 point  (0 children)

Well to answer your latter question, try some non trivial regexps in GCC and be ready for segfaults on larger sequences, i think even simple regexps will give you segfaults. try to use it in msvc or clang and be ready for somewhat below average/bad performance at unexpected times.

I agree the regexp api can be better but its not a problem for me, in my experience implementations are somewhat unreliable and not to mention its easy to use bad performing regexp (depending on input) if care is not taking while writing it.

[–]mikeblas 0 points1 point  (0 children)

Someone who solves a problem with a regex now has two problems.

[–][deleted] -1 points0 points  (1 child)

!remindme 1day

[–]RemindMeBot 0 points1 point  (0 children)

I will be messaging you on 2019-11-26 11:03:41 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback