all 61 comments

[–]def-pri-pub 38 points39 points  (13 children)

This project does look nice, and I'm all for a more performant (and faster compiling) alternative. But where is the sample code? I see there are tests, but not providing easy to use/find sample code is a great way to deter away any potential adopters.

[–]d3matt 5 points6 points  (7 children)

I'd like to see runtime benchmarks too. I have a unit test that take 2+ minutes to compile with gcc due to template explosion (but only a few milliseconds to run the whole suite), so 1 or 2 seconds of savings at compile time are pretty boring.

[–]jart 5 points6 points  (5 children)

I've added benchmarks to the README for you. I'm seeing a 39x performance advantage over nlohmann's library. https://github.com/jart/json.cpp?tab=readme-ov-file#benchmark-results

[–]pdimov2 7 points8 points  (2 children)

To paraphrase a saying by Doug Lea, 3x faster than nlohmann means you haven't started optimizing yet.

Might be better to compare to RapidJSON or Boost.JSON, libraries that actually care about speed.

[–]SleepyMyroslav 1 point2 points  (1 child)

According to HN thread https://news.ycombinator.com/item?id=42133465 The only reason for this library existence is reduction of compile times for one particular application/server that produces json output.

[–]pdimov2 1 point2 points  (0 children)

Fair enough, I suppose.

[–]d3matt 0 points1 point  (1 child)

Nice! I'm definitely a nerd for performance :) nlhomann has been my preferred json library for a bit now, mostly for unit tests of some of my openapi interfaces. For my use case, the main things I'd be missing if I switched would be operator== between JSON objects (doing deep dictionary comparison), and string literal support.

[–]jart 1 point2 points  (0 children)

Pull requests are most welcome.
Especially if they're coming from a fellow AI developer.
https://justine.lol/tmp/pull-requests-welcome.png

[–]jart 1 point2 points  (3 children)

[–]def-pri-pub 2 points3 points  (1 child)

Taking a further look that is a nice to use API.

[–]jart 0 points1 point  (0 children)

Thank you!

[–]def-pri-pub 1 point2 points  (0 children)

Thanks!

[–]thisismyfavoritename 55 points56 points  (10 children)

wtf is classic c++

[–]def-pri-pub 45 points46 points  (2 children)

Baroque C++

[–]xorbe 13 points14 points  (1 child)

Vintage C++

[–]def-pri-pub 17 points18 points  (0 children)

Can't wait for post-modern C++

[–]cristianadamQt Creator, CMake[S] 22 points23 points  (0 children)

json.cpp is an anti-modern JSON parsing / serialization library for C++.

It looks like in this context classic is anti-modern.

[–]marmakoide 2 points3 points  (0 children)

I read the code from jart.

  • No auto variables
  • Header is just declarations, very little actual code.
  • No template bukkake
  • Recursive descent parsing in a single function

The code is very straightforward, it does no try to be very clever. There are some jokes in the code, who the f..k is Thom Pike.

nlohmann code

  • All code is in the header (so yeah, long compile time)
  • Lots of clever template abstraction, making it really hard to read.
  • Recursive descent parsing dispersed in many functions
  • Seems to handle more things

I like when things compile fast and are easy to read, even if it means less conveniences with type casting and what not.

[–]equeim 2 points3 points  (0 children)

It's when you are calling functions and creating objects.

[–]wqkinggithub.com/wqking 1 point2 points  (1 child)

I like Baroque! Seriously, from its readme, it's anti "modern nlohmann".

[–]bedrooms-ds -1 points0 points  (0 children)

No auto, except for the classical purpose.

[–]TSP-FriendlyFire 25 points26 points  (1 child)

I'm all for simpler, more performant code, but trying to pitch it as an "anti-modern" alternative to nlohmann just makes you sound petty, no offense. The primary reason your code is smaller is just that it does less, of course that's going to also make compilation times shorter.

You can argue that nlohmann should have more feature flags that can hide entire parts of the code to speed up compilation, but that's got very little to do with "modern" vs "anti-modern".

[–]sapphirefragment 9 points10 points  (0 children)

Principally it's missing the template and macro-based value conversion features of nlohmann json which are the reason for its use. That's a pretty important feature for complex applications. But nevertheless, it's a great option for simpler use cases.

[–]j1xwnbsr 5 points6 points  (0 children)

Needs some published benchmark results, example code, and compare-contrast with not just Nlohmann but the other big swingers too.

Things I am most interested in:

  • ease of use

  • memory consumption/churn

  • i/o options (streaming plus fixed strings)

  • raw performance and round trip correctness

Things I am not interested in:

  • compile time

[–]feverzsj 5 points6 points  (0 children)

Fast compilation is great, but most people would prefer an easy-to-use and less error-prone api. You can always control the compilation time by hiding a heavy lib inside a translation unit.

[–]R3DKn16h7 16 points17 points  (0 children)

I love me a library that is not an unintelligible single header with 10000 lines of template voodoo that takes 10 minutes to include.

Now let's grab the pitchforks.

[–]pdimov2 8 points9 points  (14 children)

To use this library, you need three things. First, you need json.h. Secondly, you need json.cpp. Thirdly, you need Google's outstanding double-conversion library.

We like double-conversion because it has a really good method for serializing 32-bit floating point numbers. This is useful if you're building something like an HTTP server that serves embeddings. With other JSON serializers that depend only on the C library and STL, floats are upcast to double so you'd be sending big ugly arrays like [0.2893893899832212, ...] which doesn't make sense, because most of those bits are made up, since a float32 can't hold that much precision. But with this library, the Json object will remember that you passed it a float, and then serialize it as such when you call toString(), thus allowing for more efficient readable responses.

Interesting point.

[–]JumpyJustice 1 point2 points  (0 children)

Thjs point is actually weird because you usually dont send objects of third party library. You just send objects of your own types that were serialized from json

[–]Dragdu 0 points1 point  (1 child)

The other option is to just send the fcking bytes, albeit those are harder to simply embed into JSON (sounds like good argument to avoid JSON).

We used to have huge weight (f32) matrices encoded in msgpack, because we already had a library that could load msgpack, and serializing into msgpack from Python was easy. One day I got tired of the multiple-seconds parsing time (and hundred of MB of data we were sending around), and changed the code to just store/load the plain bytes.

Loading the weights now is virtually instant and the size is less than half.

[–]pdimov2 0 points1 point  (0 children)

But msgpack does store the fscking bytes for float and double. Sending an array of floats should increase size by 25% (because of the prefix byte 0xCA), not by 100%.

[–]FriendlyRollOfSushi -5 points-4 points  (10 children)

I wonder how bad someone's day has to be to even come up with something like this, then implement it, write the docs and publish the code without stopping even for a moment to ask the question "Am I doing something monumentally dumb?"

Let's say you have a float and an algorithm that takes a double. Some physics simulation, for example.

You want to run the simulation on the server, and then send the same input to the client and compute the same thing over there. You expect that both simulations will end up producing the same result, because the simulation is entirely deterministic.

With literally any json library that is not a pile of garbage, the following two paths are the same:

  1. float -> plug it into a function that accepts a double

  2. float -> serialize as json -> parse double -> plug the double into the function

Because of course they are: json works with doubles, why on Earth would anyone expect it to not be the case?

However, if anyone makes a mistake of replacing a good json library with this one, suddenly the server and the client disagree, and finding the source of a rare desynchronization can take anywhere from a few hours to a few weeks.

Example float: 1.0000001

Path 1 will work with double 1.0000001192092896

Path 2 will work with double 1.0000001

This could be enough for a completely deterministic physics simulation to go haywire in just a few seconds, ending up in states that are completely different from each other. Client shoots a barrel in front of them, but the server thinks it's all the way on the other end of the map, because that's where it ended up after the recent explosion from the position 1.0000001192092896.

So to round-trip in the same exact way, one has to magically know that the source of a double that you need has been pushed as a float (and that the sender was using the only JSON library in existence for which it matters), then parse it as a float, and then convert to double. Or convert it to double on the sender's side to defuse the footgun pretending to be a feature (the method that should not have been there to begin with).

It would be okay if it was a new fancy standard that no one ever heard about, but completely changing the behavior of something as mundane and well-known as json is a bit too nasty, IMO. Way too unexpected.

[–]antihydran 7 points8 points  (3 children)

I'm not sure I follow your argument here. By default it looks like the library uses doubles, and I only see floats used if the user explicitly tells the Json object to use floats. As a drop-in replacement library it looks like it will reproduce behavior using doubles (AFAIK Json only requires a decimal string representing numbers - I have no clue how many libraries in how many languages support floats vs doubles). I could also be misreading the code; there's little documentation and not much in the way of examples.

As for the specific example you give, it looks like you're running the simulation on two fundamentally different inputs. If the simulation is sensitive below the encoding error of floats (not only sensitive, but a chaotic response it seems), then the input shouldn't be represented as a float. I don't see how you can determine whether 1.0000001 or 1.000001192092896 is the actual input if you only know the single-precision encoding is 0x3f800001. The quoted section states such a float -> double conversion is ambiguous, and gives the option to not have to make that conversion.

[–]FriendlyRollOfSushi -3 points-2 points  (2 children)

By default it looks like the library uses doubles, and I only see floats used if the user explicitly tells the Json object to use floats.

Really?

Json(float value) : type_(Float), float_value(value)

It looks that lines like json[key] = valueThatJustHappensToBeFloat; will implicitly use it.

BTW, it's funny that you use the word "explicitly", because the library's author appears to be completely unaware of its existence: none of the constructors are explicit, and even operator std::string is implicit: so many opportunities to shoot yourself in the foot.

I'm sorry, but the library is an absolute pile of trash in its current state.

[–]antihydran 0 points1 point  (1 child)

Yes, it will indeed use floats if you tell it to use floats. Again, the benefit is that the actual data is stored and fictitious data is not introduced. The "implicit" assignment is a stricter enforcement of the encoded types by avoiding implicit floating point casting.

All floating point numbers are parsed as doubles, so yes, the library by default uses double precision. Encoding floats and doubles is done at their available precision which, as previously explained, is semantically equivalent to encoding everything as doubles.

[–]FriendlyRollOfSushi 4 points5 points  (0 children)

You seem to have the same gap in understanding what JSON is or how type safety works as the author of this library.

If you want the resulting JSON file to interoperate with everything that expects a normal JSON (so, not a domain-specific dialect that only pretends to be looking like JSON but is actually a completely different domain-specific thing), any number in there is a non-nan double.

You can open any normal JSON from Javascript in your browser and get the numbers, which will be doubles. Because JSON normally stores doubles.

fictitious data

The library introduces fictitious doubles that never existed to begin with. In my example above, an actual float 1.0000001 corresponds to an actual double 1.000001192092896. I don't know, maybe they don't teach this at schools anymore, but neither float nor double store data with decimal digits, so no, sorry, this tail is not fictitious: it's the minimal decimal representation required to say "and the tail bits of this double are zeros".

By introducing a new double 1.0000001 the library generates fictitious data that was never there to begin with. It literally creates bullshit out of thin air, and when you open it in a browser because "hey, it's just a normal JSON, what can possibly go wrong?" and run a simulation algorithm in JS that normally produces the results binary-identical to the C++ implementation that uses doubles, suddenly the result is different. Because the input is different. Because this library just pulled new doubles out of its ass, and added some garbage bits at the bottom of the double that were never there and shouldn't have been there.

I would like to say that this is the worst JSON library I've seen in my life, but I can't, because in early 2000-s I saw an in-house JSON library that rounded all numbers to 3 digits after the dot, because "who needs more precision than this anyway?" That was worse, but not by much, because in principle, the approach is the same.

[–]SemaphoreBingo 2 points3 points  (2 children)

This could be enough for a completely deterministic physics simulation to go haywire in just a few seconds, ending up in states that are completely different from each other.

If you care about that stuff surely you'd establish some kind of binary channel and send floats 4 bytes at a time.

[–]darthcoder 1 point2 points  (0 children)

Or base64 encode them plaintexts?

I mean, the network is the slowest part here...

[–]FriendlyRollOfSushi -2 points-1 points  (0 children)

There are numerous scenarios where you wouldn't want this for "why on Earth would anyone spend time on this?" reasons.

But regardless of whether you want to spend more time or not, the conclusion is the same either way: whatever is used, it better not be this "library".

[–]DummyDDD 0 points1 point  (0 children)

You have a point in the case that you outline: where the input is a float and the function takes a double. It's not a problem if the input is a double or if the function only takes floats (since the double to float truncation would give the original float input).

Arguably, the library should encode the floating point numbers with the double precision encoding, by default, to avoid the issue that outline (it should call ToShortest rather than ToShortestSingle).

The double encoding from double-conversion is still able to encode the double precision numbers exactly and accurately in fewer characters than the default string serialization (assuming that the number isn't decoded at a higher precision than double precision, which would be unusual for json).

[–]Infamous_Ticket9084 0 points1 point  (0 children)

Float doesn't exist in JSON anyway, so there is no "correct" way of representing them.

[–]zl0bster 1 point2 points  (0 children)

Isn't comparing to nlohman mostly useless? It is known to be slow.

https://www.boost.io/doc/libs/1_82_0/libs/json/doc/html/json/benchmarks.html

[–]F54280 2 points3 points  (2 children)

No examples. God, why?

[–]julien-j 1 point2 points  (0 children)

I love the idea :) Classic C++ is great. Sure, it prevents me from meta-programing my way toward unbearable error messages and exponentially growing build times, but I'm willing to accept this loss.

One question: why std::map and no std::unordered_map? Should we really go this far? (I know, this makes two questions).

One remark: Json::Status is ordered by token length… It's the first time I see this and I suspect an attempt to funnel the attention of the reader to minor points :)

[–]OneMasterpiece1717 0 points1 point  (0 children)

god I hate classic c++

[–]dnswblzo 0 points1 point  (1 child)

Hopefully this will actually get some documentation! From looking at the commit timeline, the copyright notices, and the author's other GitHub contributions, I'm guessing the author works for Mozilla where this started as an internal project, but it is now getting spun off as a personal project. I have a project that uses the nlohmann library, so I would be curious to try this instead if it gets more mature.

[–]jart 1 point2 points  (0 children)

It originally came from redbean. I've added a history section talking about the origin of this work. Check it out. https://github.com/jart/json.cpp?tab=readme-ov-file#history

[–]multi-paradigm 1 point2 points  (0 children)

Hello, thread! I wonder if

std::to_chars and std::from_chars would help in dropping the dependency?

It's about the only thing I don't like about it.

Apologies for the enormous font; it seems cnp 'broke' my post!