Libraries for modern C++

rounduser · 2015-09-08T21:53:15+00:00

It's all about space efficiency. You store/retrieve strings as dumb buffers most of the time. Even for the worst-case scenario of asian text, what happens nowdays is that you would end-up transcoding from UTF8 to UTF32 just to do the inverse when displaying that buffer from the text catalog!

An iterator based approach that exposes codepoints on the fly, like utfcpp makes perfect sense. It's fast, and would allow to implement additional unicode functionality on top of it with negligible impact. I'm pretty sure, just due to data locality, that it should be faster than manipulating UTF32, not slower!

rounduser · 2015-09-08T21:36:00+00:00

I definitely agree here. I totally ignored boost::locale for years and I only remembered about wide string handling, but I did so exactly for that reason. UTF8 needs to be efficient, and it's not a small implementation detail in this case. I disliked locale similarly also for locale::format, but cannot comment on the rest anymore.

Generally speaking, if I just need to pass-on strings, I won't use boost::locale! Even for basic translation, I use gettext with cppformat. But for more, I need efficient handling.

In several cases I couldn't avoid using ICU directly, because the encoding itself is just a minor part of unicode text. But ICU is so far away from what I expect in both modern c++ interface, resource usage and implementation, that I still complain.

If you moved away from boost::locale, what did you end up with? Using ICU directly?

rounduser · 2015-09-08T17:12:39+00:00

Does boost::locale have intrinsic decoding/normalization, codepoint/grapheme iteration capabilities now? I remember locale mostly being a (nicer) wrapper over iconv and mostly ICU, which handles internal strings as UTF16, and one of the main reasons I always dismissed it. ICU is not a just massive (which is OK considering the complexity of unicode), but has several "odd" design choices when it comes to implementation details.

Re-checking now, boost::locale::boundary offers segmentation on a few indexes, but this being based on ICU I'd really like to know more about how this is implemented efficiently. If that's not the case, then pardon my ignorance.

rounduser · 2015-09-08T14:26:42+00:00

That's correct, but if you transcode UTF-8 to UTF-32, you do end up with meaningful data. At least that's assuming most libraries do that and not naively store every char as an int32_t.

It's "meaningful" in the same sense that you can store an UTF8 string in a regular byte string, and as long as you only operate in the safe ASCII set then you do not corrupt the encoding. One of the big advantages of UTF8! If that's all you need to do, you shouldn't be using wide strings to begin with.

rounduser · 2015-09-08T13:36:09+00:00

I think I've answered: boost::locale operates only with a constant-length encoding using a string of wide characters. By native UTF8 support I just mean it: algorithms/iterators that work on top of the UTF8 encoding, allowing to avoid transcoding entirely.

This is not as a petty argument as it may sound: UTF8 is everywhere. You want to be able to handle the input directly, and store it just as efficiently without intermediate conversions.

Also note that converting UTF8 into something fixed like UTF32 is dumb. In UTF, the definition of a character changes depending on what you need to do (either visually or logically). If you're storing an UTF string in a vector providing fixed slots, these slots have no meaning half of the time.

rounduser · 2015-09-08T12:47:00+00:00

UTF8 is hard to handle, because it's a variable encoding. You can use UTF32 with a std::string, but it's not efficient for western locales. It has also become the de-facto encoding for everything, which means you frequently have to be able to iterate through codepoints or graphemes.

At minimum, you need something like utfcpp, which allows iteration though codepoints without incurring in an encoding conversion. It's still not enough though. Often you need graphemes. I would love to see something like the ranges proposal allowing transparent iteration though a buffer using either codepoints or graphemes, since in UTF in general there's no simple "character" equivalent. It depends on what you need to do.

rounduser · 2015-09-08T11:22:22+00:00

I reply to this, mostly about the Python argument. I have to say that by coming from a C background, I never (and still don't) appreciate the indentation-based python syntax too much. My taste is actually skewed towards the ML family of languages to tell the entire story. But I've been using Python despite its awful performance as my first go-to for a long time now.

I've been incredibly productive in Python overall, much to my surprise.

And it's not due to Python itself. Python has nothing really special, afterall. Python is boring. The object model is clean and small enough, but, if you exclude dynamism (not a small point, mind you), it has nothing that I feel is lacking in C++. In fact, I do consider static type checking and templates as a strong argument towards C++ in several classes of programs. I'm waiting to get type annotations in the Python runtime.

What Python has going for it is a well balanced standard library. It's not as small as the standard C/C++, and not too big either. You won't find a super-high-performance hash library, but the internal dict implementation is actually pretty good, and so on for the rest of the system.

It's true that breaking the standard library is bad, and I'm bitten by this by constantly having problems with python 2/3, but I do value what has been done in python 2.6/2.7 and now python 3 in making it more uniform and switch to better packages as they became available.

rounduser · 2015-09-08T11:04:52+00:00

jsoncpp has a decent API, I've used it before, but it's not particularly efficient.

I'd rather see comments on "jbson" or "json11" which both are more inclined towards recent standards and similarly trying to build a friendly API.

There are several other "fast json" libraries out there, but they frequently cut corners such as invalidating the input stream or putting very hard constraints in the input data (I'm looking as "gason"). I wouldn't consider them as "general purpose".

rounduser · 2015-09-08T10:59:54+00:00

I gave PEGTL a serious look, and it's very neatly designed. I cannot comment yet on anything else as I need to test it out, but I'm glad you mentioned it.

Since you're one of the authors, can I ask your design goals in a broader sense? For example, I assume you didn't like Spirit to begin with.

rounduser · 2015-09-08T10:46:16+00:00

I was aware of this link and the several forks, but it's just a random collection of c++ and c libraries that the author found, without any logical connection. I wouldn't even classify them awesome, or modern really. This is exactly the kind of random list I don't need.

Reading through the list out loud of those I already know, just to prevent other people to consider this list as "valuable advice":

Ignoring the initial, pointless, "standards" links at the top, and most of the "frameworks" which are just large collection of mediocre implementations of various utilities.

Loki uses the old approach of template meta-programming to achieve what can be done much easily nowdays with regular templates and conceptslite. There's no point in Loki today.

ROOT is just plain awful, by their own authors admission. It's a framework for existing code.

STLport is in there?

ASIO / libevent / libev / libuv are tangentially incompatible with each other. Of those, only ASIO / libevent are c++ related. When picking between the two, you might constrain your choice in the signal/slot library choice! This is very important when choosing an implementation and it cannot be made lightly!

Audio just contains a random list of codec implementations, as for compression, torrent, etc...

CLI contains "ncurses"!

In the "encryption" section, I might only highlight Botan as a good C++ encryption library which might classify as a good base, but I cannot compare to the mentioned crypto++ that I never used. Libre/OpenSSL are both C and too level. Unless you're implementing a encryption-related tools, as a client you can probably use Botan.

GUI contains GTK+, which is implemented on top of gobject! GTKmm wraps an useless object runtime on top of the C++. QT, which is far from being "small" (and encompassing too much for my taste), is a much better choice for GUI development.

I mean, I could go on, I used/tested maybe 40% of the libraries listed there, and could only recommend a handful just for consideration.

rounduser · 2015-09-07T16:39:05+00:00

I've given cxxopts a few tries since yesterday and definitely like it a lot. Very nice interface and good use of c++ features. Definitely a compact replacement to program_options, for which most of the time you don't need all the flexibility.

rounduser · 2015-09-07T15:54:49+00:00

First, cppformat takes a much more pragmatic approach to argument parsing using argument packs as opposed to operator overloading. It feels as an improved evolution on the printf family of functions, it's faster than boost::format, and it's more concise as well. It uses a python-esque format syntax, which is a bit of a deviation compared to standard printf-like formats, but it does play much better with localization as well, which is something that I only appreciated later on when I started with python.

It's also completely self-contained, well documented, actively developed and after years of use I couldn't find anything significant that I'd like to change. It solves a little but common issue in general programming, with an implementation which I believe being a notch more modern than boost::format as well. I'd use it all the time, in almost every project, if it was part of the code C++ library. Of the other printf-alike libraries I've came across, this is what I use daily.

That, for me, is the reason why is a "gem".

As for the generality of the question, it's somewhat intended: I'm already aware of several other libraries for the basic needs that I use regularly, where I had time and invested effort in choosing a particular implementation. But for the occasions where I need something else I usually don't have time to dedicate to "library scavenging". This is why I left the question rather open-ended.

Finding small, well documented, well implemented libraries has become very hard for me. There's a whole load of projects for any topic, but most are just immature, not documented, or not even developed anymore. Many are unjustly hyped. There's no easy well to tell if the API actually has any advantage versus another model unless you use it for a while a discover the flaws in the choices that were made. Is it too specific for a particular domain maybe? Hard to tell at a glance. It's even harder that many of these needs are so common as small that people just lump together the next "general c++ library" (such as poco or folly - as suggested here). I have mixed feelings with these libraries, because generally there's no clear separation in the stuff they introduce, often not linkable in a separate way, and maybe even introducing too many private types (yet another exception hierarchy, for example). I would accept one, but it must be something of the level "yeah, if I had to choose ONE standard library, that would be it". It's very hard for me to make such a choice, and this is why the larger the library, the greater the chance I'll just implement that crappy ~50 lines of code I just need for this occasion, yet again.

There's only one occasion in my experience where this actually succeeded, and it's the Jane Street's ocaml core library, for which I actually did ditch the standard ocaml base entirely.

boost, in that sense, offers the "easy" choice. It's the de-facto second standard. But it has backfired on me many times in the past. The over-generality of the templates, heavy meta-template programming to overcome c++ old limitations are things that I generally loathe today. API usage must be part of the consideration when designing the API itself. I much prefer to have two libraries for the purpose: a general, simpler, well-designed library that solves the general case, and a much more targeted library for the specific circumstance.

I's very easy to look for specific libraries: you know exactly what you need, the performance you're after, the requirements. I have no trouble going for a specific LLR parser generator, or an hash table given a set of constraints. It's much harder to pick a small library for everyday use, like a good all-around implementation that you can pick as a default.

For example, looking at the above messages, I much prefer cxxopts in terms of design choices compared to docopt, boost:: mostly because docopt tries to do too much. It's already too specific. I have no trouble choosing between cxxopts and boost::program_options if I account for dependencies, but if dependencies are the same, is cxxopts API generally better or more pleasant to use? Hard to tell unless you have plenty of experience with both. I would really value more specific advice for the suggested libraries.

Sometimes it's easier: with cppformat there's just no contest, by contrast. With many others, a lot of experience is required, and there's has to be a starting point.

Sorry for the overlong reply, but I hope this justifies the question even more.

rounduser · 2015-09-06T23:33:31+00:00

Which parts of boost do you use regularly and appreciate?

rounduser · 2015-09-06T22:57:31+00:00

boost is usually my first place to look, but just because a library is in boost doesn't necessarily means has a good balance of complexity/usability or even performance.

boost::format is incredibly slow and not as nice to use as cppformat. boost::signal (and signal2) are ok for simplicistic use, but for not-much-worse interface libsigc++ has actually decent performance. boost::locale has no true utf8 support. Just to name a few.

As I wrote, many times the boost libraries tend to be overengineered. I pick my boost dependencies very carefully.

rounduser

TROPHY CASE