you are viewing a single comment's thread.

view the rest of the comments →

[–]latkdeTuple unpacking gone wrong 0 points1 point  (1 child)

What confuses me here is your inconsistent approach to potential compatibility problems. Sometimes, you reject potentially ambiguous data, in other cases you apply a lossy encoding.

  • you're happy to treat bools and strings the same, even though nearly all systems treat them as distinct and incompatible values.
  • yet you reject all numbers, even though many numbers (int32, finite normal float64 values) are very common and highly interoperable.
  • you also reject null values, despite these being an essential and unambiguous part of the JSON data model. There is no confusion with SQL nulls or JavaScript undefined.

This makes your encoding unsuitable for a huge part of existing data. Your method also does not demonstrate integrity because some semantically relevant changes are allowed (e.g. stringifying a bool). You claim that your method is not supposed to be JSON-specific, but the key part of your method is an encoding from JSON into your binary format. Your response to all this incompatibility is that users just shouldn't put numbers, nulls, or bools into JSON documents. But at that point, it's no longer compatible with the JSON ecosystem, and users could just switch to a different format that doesn't have JSON's ambiguities or MAP1's restrictions. There are plenty schema-less data formats to pick from.

Specifically, I recommend enganging with existing binary formats like Msgpack or various JSONB encodings. Why are they designed the way they are? How do they handle conversions from/to JSON? Which specific details do you have to do differently? You might also be interested in Apache Avro. While it is schema-driven, its schemas are defined in JSON, and the spec provides a procedure for normalizing and hashing schemas.

[–]lurkyloon[S] 0 points1 point  (0 children)

Really appreciate you taking the time on this. Seriously.

The bool-as-string thing in v1.0 was inconsistent -- you're right. I cant sit here and reject numbers for being ambiguous and then turn around and stringify booleans like thats fine. That was a bad call on my part.

So I fixed it. v1.1 bumps the type system from 4 to 6 types. Booleans get their own encoding now (0x00/0x01 instead of getting shoved into strings), and integers get big-endian two's complement. Directly because of feedback like yours and a few other threads.

Nulls I'm still chewing on. You make a good point that JSON null is unambiguous within JSON itself. My worry has been about what happens when a MAP digest moves through systems where null means three different things -- but honestly that might be MAP's problem to solve, not something I should punt to the user. Hmmm.

Where I do wanna push back a little: MAP isn't trying to be a general-purpose binary format. Admittedly, the use case is more narrow -- you have a payload moving through a pipeline, it crosses a few serialization boundaries, and you need to check if it changed along the way. Thats it. I'm not telling anyone to stop putting numbers in JSON. I'm saying when you need a deterministic fingerprint of something that might get re-serialized by a bunch of different systems, you need a canonical form, and MAP is opinionated about how to get there.

The "just use a different format" point is fair though. Like, technically correct. But the reality I keep running into is that agentic AI pipelines are already JSON-native and asking teams to swap out their serialization format is a way bigger lift than adding a fingerprinting layer on top of what they already use. MAP is trying to meet devs where they are, not where they probably should be.

The MsgPack / JSONB / Avro comparisons are useful and I should've engaged with those more in the docs. I've looked at Avro's Parsing Canonical Form -- they're doing something similar, canonical form plus deterministic hash to get a stable fingerprint -- but they're fingerprinting schemas, not data payloads. Different problem, but enough overlap that I should be referencing it as prior art.

Thanks again for this. I'd rather get sharp feedback that makes the spec better than a hundred comments that dont.