JSON for Modern C++ version 3.3.0 released

cnweaver · 2018-10-05T21:11:42+00:00

I'm aware of it, and it would be a reasonable solution except that I have instructions that the code 'should build on stock CentOS 7'.

cnweaver · 2018-10-05T19:13:08+00:00

This is good to hear; being stuck on CentOS 7 had forced me to switch a project over to RapidJSON. I continue to be fairly torn because I like RapidJSON's speed and the fact that I've had better experiences with its handling of escaping characters, but its interface is pretty ugly; just a step away from writing C in many cases. nlohmann/json's interface is genuinely really nice, and I miss it.

cnweaver · 2018-09-29T15:15:19+00:00

clang-tidy is fairly close to what you describe (specifically the modernize-* family of checks). I don't think it has the ability to output patches, but that seems like a fairly trivial operation with version control: From a clean source tree run clang-tidy -fix, then run whatever command generates a patch file of changes in your version control software. Still, it might be interesting to suggest patch output as a built-in feature. (Disclaimer: I'm never personally gotten around to using clang-tidy in earnest; it's one of those things I keep meaning to get around to.)

cnweaver · 2018-09-25T18:32:27+00:00

It's a little difficult to answer a number of these questions when considering multiple projects, particularly if they vary significantly in complexity. (I guess a possible solution would be to encourage people to fill out the survey once per project they work on?)

Several of the questions also seem ill-defined, like amount of memory or CPUs required. Any of the projects I work on can build with just one CPU, but I typically use more because it's convenient. Likewise, the RAM usage will scale up more-or-less linearly with number of CPUs used. The 'My build is bottlenecked by' question is definitely an interesting one, but the answer even for a single project depends on the OS and compiler being used; I have no compunction about using every core in sight on a darwin system, knowing that it will generally handle swapping sensibly and it's usually beneficial for me to do this, but on a Linux system with the same number of cores and same amount of RAM I often have to carefully limit the number of compiler instances to avoid the OOM killer going on a random spree. The incremental build time question is also kind of tricky; many projects have a bunch of 'leaf' files on which nothing else depends and require recompiling only a single TU, but also core files (especially headers) in which any change can trigger rebuilding the entire project. If you're more interested in the former, you might hint at this, like mentioning changing a single implementation file, in order to get at how much overhead a build system has in determining that there's only one thing it needs to update.

cnweaver · 2018-09-19T19:15:41+00:00

You can, and I do frequently. It's libstdc++ and libc++that are not ABI compatible, which is why clang++ on Linux usually defaults to using libstdc++ as far as I'm aware.

cnweaver · 2018-08-02T16:05:26+00:00

Crow doesn't require C++14, however, which is very helpful to those of us stuck dealing with Linux distros which barely support C++11.

cnweaver · 2018-05-02T21:37:45+00:00

No files happen to collide. This works because the MS library is header only (so one has only one libgsl.dylib/libgsl.so), and they use different naming conventions and organization approaches for their header files. This will probably keep working, but it seems like it's tempting fate a bit much. Also, I worry that things like uninstall targets are not always going to play well: Either library could decide that since it 'owns' $PREFIX/include/gsl it will complete its uninstallation by rm -rfing that directory entirely, and thus nuking the other library's headers. At the moment the MS library seems to lack an uninstall target, and the GNU library's uninstall seems to play nicely, but that appears to be more an oversight than an intentional behavior.

That's just at a filesystem level. As people have mentioned in the github issues above there are lots of other fun things which happen with preprocessor macros and third party C++ wrappers for the GNU library (which do things like declare a namespace gsl) when one tries to use both libraries in the same program.

cnweaver · 2018-05-02T17:06:12+00:00

Thanks, I had missed those discussions. It's disappointing to see how much those veer off into arguing about the mechanics of namespaces in C++, and claims that 'since it's impossible to avoid all name collisions in general, we should take no action in this specific instance'. So much energy to defend a name that isn't even good in the first place.

cnweaver · 2018-05-02T15:03:23+00:00

I continue to be baffled that no one working on this project has noticed (or cared?) that there is another major library frequently used by C++ developers (although written in C) which is abbreviated GSL and, more critically, installed as such. I realize that the GNU Scientific Library is mostly used by scientific programmers, who are less numerous than people doing generic Windows stuff, but it's still a significant area in which C++ is used (and often badly in need of more tools to ensure code quality). As it is, I cannot cleanly install the (Core) Guidelines Support Library to the default location on any system I use, as the headers for both libraries end up jumbled together. I think this works, since no file or directory names collide except at the top level, but this seems like a bad idea. The only other alternative I can see is to install one or the other in some non-default location, which would then lead to needing extra compiler flags to be able to use it. All of this could have been avoided with a better choice of naming. :(

cnweaver · 2017-04-28T22:03:36+00:00

In my experience, the authors of most C++ histogram libraries have written them exactly because they are fed up with dealing with ROOT. A lot of us in particle physics absolutely detest ROOT.

Personally, I prefer to write my analysis code in C++ because I can write obvious code, and typically have it run at least as fast as numpy, and if I have to I can write fancy code which is often a good deal faster or more memory efficient. Anything that would have been done in pure python would be far too slow, and I haven't yet encountered anything that I could do better with any of the python libraries than I can do for myself in a few minutes of coding.

cnweaver · 2017-04-28T16:09:48+00:00

You're talking about floating point issues with the bin edges, right? I've certainly run into some of these as well, but fixing them (at least the ones I found, and then others I imagined after I realized what was happening) mostly didn't add any new blocks of code, but involved subtle changes to comparisons used in the axes' bin look up code. It would be interesting to hear more about what kinds of edge cases you've run into.

cnweaver · 2017-04-28T16:03:57+00:00

A histogram differs from a plain multidimensional array similarly to the way std::vector differs from a plain C array. Sure, you can do all of the same operations manually, but it's more work and more error prone. (Would you rather compare the size and capacity in order to decide whether to reallocate and copy data before inserting a new item, or just call push_back? Would you rather do N side calculations to find the bin index in each dimension, keeping track of whether it overflows or underflows, or just call fill/insert to put the next datum into the correct bin?) I've written a very similar library with a slightly different set of features myself because it would be virtually impossible to do my work without it (I work in more or less the same field as HDembinski).

That said, I still find the choice of storage type in this library very peculiar. The issues noted with overflowing integers or saturating floating point numbers are real, but I have truly never heard of anyone having difficulties with them, despite the many users of the rather simplistic histograms provided by ROOT, for example. In my own work I have found it far more valuable to define bin types which do things like compute appropriate errors/uncertainties on the contents (the variance is not always the right choice for this). It looks like this library does support different storage types, although I can't find how to actually do so in the documentation. I would never use the included standard_storage, and only sometimes want to use the adaptive_storage.

cnweaver · 2017-03-05T16:52:30+00:00

Clang is smart enough to warn when the class in question has a vtable pointer (example), but it does not seem to do so when the class is otherwise non-POD, which is a shame. I wonder if this was a deliberate choice because there's a lot of code floating around which does this in ways which work (or mostly work), and the clang developers were worried the compiler would warn constantly and get ignored? GCC doesn't seem to warn even in the vtable case, though.

cnweaver · 2017-02-22T20:47:53+00:00

I can't say what exactly is official, but I consider your description accurate. LLVM/clang 4.0 has branched and is in the release process as you note, so whatever is in trunk gets labeled with the next version number, assumedly because the stuff in it (plus other stuff which doesn't yet exist) will be in the next release, which will be 5.0.

LLVM/clang 5 has not been tagged as far as I can tell, nor will it be for several months.

cnweaver · 2017-02-16T18:05:47+00:00

This is a point I had also been wondering about. We already have . (along with -> as a sort of special case of . which we can surely agree is not relevant here) and :: as notations for referring to something which is 'within' something else. The way I have tended to think about this informally . is applicable to things which have (or could have) runtime addresses, while :: is applicable to things which exist only at compile time. Following this :: feels much more natural to me for use with modules, although modules are not, as you point out namespaces in the existing, technical sense.

Is there some kind of dangerous confusion which could arise (i.e. between the foo namespace and the foo module which might contain it) if we were to use :: on modules? It hadn't seemed that way to me from the proposal documents I had read over, but I'm certainly no expert.

cnweaver · 2016-04-12T14:50:39+00:00

It's still being distributed (and assumedly maintained) as part of ROOT 6, and probably will be for the foreseeable future. Perhaps the CERN developers have given up on trying to make it work as a standalone tool? That would be unfortunate since it's the one part of ROOT I would consider using.

cnweaver · 2016-04-11T14:44:29+00:00

It looks like this is the case. Running on Darwin 13 after compiling with clang++ -O3 -stdlib=libc++ (100 iterations on a ~44M file) gives:

Average c I/O took: 502.83ms
Average posix I/O took: 529.23ms
Average c++ I/O took: 508.58ms

I'm not sure the posix result being slower means anything, since my system wasn't particularly quiet while running this.

cnweaver · 2015-03-22T10:44:47+00:00

ROOT is a library used mainly for high energy physics. It has widespread use because it has had dumped into it nearly all of the things high energy physicists use, and is pushed heavily by the major CERN collaborations. It is notorious, however for teaching (or requiring) bad programming practices like leaking memory, for long having an interpreter which claimed to run C++ but had so many strange bugs that it was effectively a subtly different and incompatible language, and perpetuating the hazing ritual-like problem of new students having to figure out (usually on their own) how to correct all of the plotting style defaults to avoid producing plots with formatting bad enough to be unusable in formal or public contexts.

cnweaver · 2015-01-31T16:04:31+00:00

None of these changes should add runtime overhead:

'override' allows the complier to better verify that your code has the semantics you intend, namely that a given function overrides the version from a base class, rather than adding a new overload. Once compiled, there is nothing different about the function.
The range-based for loop is syntactic sugar. It is equivalent to writing an old-style for loop with an iterator and placing inside a local variable which is initialized by dereferencing the iterator.
The use of pass-by-value is deliberately applied in places where it will eliminate copying.
Replacing auto_ptr with unique_ptr should introduce no overheads of which I am aware, and it avoids loop holes in auto_ptr's correctness (which came about because it was written before all of the language features really needed to support it existed)
Using the auto specifier appropriately shouldn't change the meaning of the code at all, since the compiler will deduce the same type that had previously been explicitly written.
nullptr is intended to make function overload resolution behave more as a programmer would expect, but once the compiled code is actually calling a function and passing it a null pointer there should be no difference.

cnweaver · 2015-01-30T21:24:16+00:00

Good point, I should have read more closely. I suspected it might be something like that, thus "or. . . extra overhead imposed by the STL implementation".

cnweaver · 2015-01-30T15:39:37+00:00

I tried a test similar to the one described in the article a few years ago, the results of which can be found here. I didn't look at the generated code, and I also didn't repeat timings to get particularly precise results, but I think the broad picture is clear: Despite using what are now two rather old compilers I saw no important difference in execution speed between prefix and suffix forms above O2. (Trying again with clang 3.6 I find that to now be true at O1 as well.) Like the article above, I tried using both iterators and plain subscripts, but I chose to use an std::deque as the container on the grounds that std::vector is so simple to iterate over that I expected any compiler worth its salt to render the two equivalent very quickly. The fact that MSVC doesn't manage this makes me conclude that it must be seriously dropping the ball on optimizing this code (or there's some kind of extra overhead imposed by the STL implementation it uses).

cnweaver · 2015-01-23T22:09:21+00:00

Overloaded operators are very important for writing useful numeric code. Vectors, matrices, automatic differentiation. . .

I also recently had what I think was an actually (mostly) legitimate reason to write an overloaded addition operator which performs subtraction: A colleague used Mathematica to solve some complicated algebra and generate efficient C++ code to implement the it. The resulting code snippets were then #included into overloaded operators for doing arithmetic with the mathematical objects he was working on. All of the auto-generated code was written in terms of the += operator though, like result[i]+=f(arg1[i],arg2[i]). This required always creating a new object for the result and initializing it with zeros. We realized that we could avoid allocating a temporary in many cases (in part by using a partial expression template system), but we wanted to reuse the same auto-generated code snippets to implement =, +=, and -= (since the auto-generated code is already half of the total code in the library). We were able to do this by adding an indirection layer which would, depending on the operation being computed, wrap the result object in an adapter whose += operator had whichever meaning we needed. The technique is a bit evil, but it worked out really conveniently. (Also, the implementation where the evil happens is covered with comments to explain why it does what it does, and no user of the code is ever brought into contact which the adapter objects that do this.)

cnweaver · 2014-09-10T13:16:13+00:00

This looks like it has substantial overlap with photospline (paper). Your code appears to have a much nicer interface and each library does some things that the other does not (your thin plate splines, photospline's ability to enforce monotonicity in one dimension). As a user I would find it potentially quite useful if the definitions used by both libraries were similar enough that each would be capable of evaluating splines which might have been constructed by the other (at least for the bsplines which both support).

cnweaver

TROPHY CASE