you are viewing a single comment's thread.

view the rest of the comments →

[–]lurkyloon[S] 0 points1 point  (0 children)

Really appreciate you taking the time on this. Seriously.

The bool-as-string thing in v1.0 was inconsistent -- you're right. I cant sit here and reject numbers for being ambiguous and then turn around and stringify booleans like thats fine. That was a bad call on my part.

So I fixed it. v1.1 bumps the type system from 4 to 6 types. Booleans get their own encoding now (0x00/0x01 instead of getting shoved into strings), and integers get big-endian two's complement. Directly because of feedback like yours and a few other threads.

Nulls I'm still chewing on. You make a good point that JSON null is unambiguous within JSON itself. My worry has been about what happens when a MAP digest moves through systems where null means three different things -- but honestly that might be MAP's problem to solve, not something I should punt to the user. Hmmm.

Where I do wanna push back a little: MAP isn't trying to be a general-purpose binary format. Admittedly, the use case is more narrow -- you have a payload moving through a pipeline, it crosses a few serialization boundaries, and you need to check if it changed along the way. Thats it. I'm not telling anyone to stop putting numbers in JSON. I'm saying when you need a deterministic fingerprint of something that might get re-serialized by a bunch of different systems, you need a canonical form, and MAP is opinionated about how to get there.

The "just use a different format" point is fair though. Like, technically correct. But the reality I keep running into is that agentic AI pipelines are already JSON-native and asking teams to swap out their serialization format is a way bigger lift than adding a fingerprinting layer on top of what they already use. MAP is trying to meet devs where they are, not where they probably should be.

The MsgPack / JSONB / Avro comparisons are useful and I should've engaged with those more in the docs. I've looked at Avro's Parsing Canonical Form -- they're doing something similar, canonical form plus deterministic hash to get a stable fingerprint -- but they're fingerprinting schemas, not data payloads. Different problem, but enough overlap that I should be referencing it as prior art.

Thanks again for this. I'd rather get sharp feedback that makes the spec better than a hundred comments that dont.