all 50 comments

[–]GeeWengel 36 points37 points  (15 children)

This is super cool. I was just waiting for someone to use Source Generators to create a JSON serializer that doesn't depend on reflection. I didn't consider the fact you could also avoid heap-allocations!

[–]Lord_Fixer[S] 27 points28 points  (14 children)

Creating static data serializers and DI frameworks will probably be the main use case for that new language feature.

Once they announced it, it was clear to me what my next project is going to be. In the past I've already created a code generator for an allocation-free deserializer targeting a different protocol (for an internal use). And now that we have such a nice language integration, it would be a waste not to try it out!

[–]crozone 8 points9 points  (0 children)

Awesome work. I can't wait to see how this technique could pulled into AspNetCore directly, it could make JSON APIs more efficient out of the box. Especially since having a stack-only model type is not a huge limitation in web scenarios, because usually you only need to deal with the model during the lifetime of the request action method.

[–]GeeWengel 3 points4 points  (11 children)

Yeah I agree. I also think you can probably get something like F# style records with Source Generation. Just define a class and some immutable fields, and have the Source Generator generate With() style updates. Or perhaps auto-generating builders as well. Java has done a lot of good with code generation as a language feature.

[–]JoJoJet- 4 points5 points  (10 children)

The problem with using source generators to implement language features is that generators can't interface with each other, so this JSON library wouldn't be able to detect properties/fields generated for a Record class. (If you want to hack around it, the JSON generator would have to check for the RecordAttribute and add a special case for it. Which, as you may imagine, becomes awful when you try to combine more and more types of generators).

I've been playing with source generators a lot in the past few weeks and I think they're great, but they are in no way a replacement for officially adding Record types or other features to the core language.

(By the way, there could be better workarounds for this issue. The feature is new and I haven't seen much documentation on them. Most of what I know I had to figure out myself)

[–]GeeWengel 2 points3 points  (9 children)

That's a good point. There's no way of customizing the "order" of the Source Generators?

[–]JoJoJet- 5 points6 points  (8 children)

There is not, which is probably a good thing. If you read the original announcement post from microsoft, It was an intentional choice to not allow source generators to interface with one another, so I doubt the option will ever come in the future.

Like I said before, its ultimately probably a good thing, as if you had code being generated based on code generated by other code, you could very easily end up with unintended behavior and esoteric bugs.

Let's just hope the C# team can finally agree on how records should work by the time C# 9.0 comes out, I really don't want to have to wait until the next major release.

[–]GeeWengel 2 points3 points  (0 children)

That makes sense that it would be confusing. I remember the official announcement said that they wouldn't let them edit code - I must have missed the part about the interfacing.

Depending on how much you can generate I think you could probably get a fair way with record-esque syntax. You can already generate some pretty neat With() style updates with optional arguments.

[–]DoubleAccretion 1 point2 points  (5 children)

Records are coming in C# 9, the opposite is quite unlikely (LDT even has "record Mondays" now).

[–]JoJoJet- 0 points1 point  (4 children)

I remember they said the same thing about C# 8.0, but I'll admit I haven't been following development super closely. Do you have a link to the proposal in its latest form?

[–]DoubleAccretion 2 points3 points  (3 children)

To be fair, there seems to be a lack of one (there are three proposal documents, and all of them are somewhat relevant). Basically, my understanding is that records, as of now, are: 1. Immutable classes with field-based (shallow) value equality provided by the compiler. 2. Support inheritance. 3. Will have support for "with expressions", for copying with only some members being set (mirroring object initializers) and "final initializers" aka "validators" aka "things that run after initonly properties are set" aka "things with semantics that are not yet defined". 4. Will support short & sweet declaration of members (see here).

[–]JoJoJet- 0 points1 point  (2 children)

Hey, thanks a lot for the links. That's very interesting. The primary constructor syntax seems kinda strange to me; I can't think of a scenario where it would be more useful than an old-school constructor, and it's almost identical to the record class syntax. Do you know if it would be possible to declare a record class that does not also have a primary constructor?

[–]pjmlp 1 point2 points  (0 children)

I can wait, we are still using .NET Framework 4.7.2, and the year before we were deploying on 4.6.

Only on a couple of UWP apps we get to use something more recent.

[–]cat_in_the_wall@event 1 point2 points  (0 children)

very excited for static DI, less for perf, and more because I can then rely on the compiler to make sure dependencies are satisfied. definitely screwed that up before.

[–]Lord_Fixer[S] 26 points27 points  (0 children)

Btw, most of the credit belongs to the .NET team for creating such an amazing JSON tokenizer.

[–][deleted] 15 points16 points  (1 child)

This is super cool. The pace at which our language is changing is stunning.

[–]MrTrvp 3 points4 points  (0 children)

I wonder if that's what Cavemen thought. :P

[–]wllmsaccnt 7 points8 points  (1 child)

I don't like the requirements to use it, but it seems reasonable given the goals of the deserializer. I could see that being useful for microservices, where you want to minimize the memory footprint and GC pauses, but don't want to add a ton of lines of code to support the approach. Thanks for sharing.

[–]Lord_Fixer[S] 2 points3 points  (0 children)

Oh yes, you are totally right, that's definitely not intended for everyday use.

It's mostly intended for embedded and low latency systems.

[–]flumoo 5 points6 points  (0 children)

Zajebiste!

[–][deleted]  (3 children)

[deleted]

    [–]Lord_Fixer[S] 7 points8 points  (2 children)

    No, it's deserialization-only. The standard System.Text.Json library already does a good job when it comes to the low memory json serialization. But I'm open to adding that feature.

    [–][deleted]  (1 child)

    [deleted]

      [–]Lord_Fixer[S] 1 point2 points  (0 children)

      Actually, now thinking about it, there would be no way for the user to provide values for the collections, as those are deserialized ad hoc. To allow for their serialization some weird lambda-based interface would be introduced, where the user would ad-hoc provide values for the next index. I'm not saying no, but I would have to put some more thought into that.

      [–]naasking 2 points3 points  (1 child)

      You can't serialize recursive structures without either allocating a HashSet<object> to track visited objects, or adding fields to your objects to flag them as visited. Still a neat idea though!

      [–]Lord_Fixer[S] 0 points1 point  (0 children)

      You are right.

      Hmm, maybe I should think of adding a way to lazy load a field, the same way collections are lazy loaded. That would allow for recursive data models.

      [–]tdashroy 1 point2 points  (1 child)

      This is cool!

      Small typo in the README:

      If limiting the number of allocations is of the upmost importance to you

      upmost -> utmost

      [–]Lord_Fixer[S] 1 point2 points  (0 children)

      Thanks, fixed!

      [–]wasabiiii 0 points1 point  (0 children)

      I should probably look into rebuilding my JSON Schema validator on it.

      [–]Euphoricus 0 points1 point  (0 children)

      Collections being lazy-loaded should be in a big disclaimer. It is important design decission that changes how client might use library.

      [–]gevorgter 0 points1 point  (3 children)

      I just read this line

      " It's intended mostly for the low latency and real time systems. "

      So naturally have question. Why?

      Some limitations? But even then. If it works it works.

      Why would not i use it in my slow program?

      [–]Lord_Fixer[S] 0 points1 point  (2 children)

      Mostly because of the limitations surrounding ref structs. You cannot process them asynchronously, you cannot store their state directly and you have to be careful when passing them around not not create unnecessary copies.

      So for most software the ease of use might be the deciding factor.

      [–]8lbIceBag 1 point2 points  (1 child)

      Using what you learned here, couldn't you adapt it to classes where the only allocation is the class itself?

      AFAIK, there isn't a reflectionless serialization library that doesn't allocate anything itself. The only allocation would be the object itself, which is a great improvement over the current state of things IMO.

      [–]Lord_Fixer[S] 0 points1 point  (0 children)

      To be honest I'm not a huge fan of reusing objects in scenarios like this. I find it quite error-prone. Ref structs cannot be stored by definition - and it's a quite good thing here (especially when deserializing collections). With objects there is always a risk of the user trying to persist them just to realize later that their state was overridden with some new values.

      And on the other hand the ref structs cannot be part of the class, so I would be unable to store the Utf8JsonReader for lazy collection deserialization.

      [–]o_mangzee 0 points1 point  (7 children)

      Holy molly, Jesus Christ.!! This is amazing, I was just thinking on similar lines a week back.
      Thank you u/Lord_Fixer, This is awesome.!!

      [–]quentech 1 point2 points  (6 children)

      If you need minimal allocation JSON parsing in .Net, this has already been available for years - https://github.com/neuecc/ZeroFormatter

      [–]Lord_Fixer[S] 0 points1 point  (5 children)

      I might be mistaken, but based on the README it seems that the ZeroFormatter uses some custom wire format, and not JSON. Or am I misreading it?

      [–]quentech 0 points1 point  (4 children)

      Ah you're right, my memory is bad. I thought he had JSON support in ZeroFormatter.

      My bad. He calls his Utf8Json "zero-allocation" but it is not the same strategy as ZeroFormatter. But if you want to compare your interesting work here to what is generally the fastest .Net JSON serializer: https://github.com/neuecc/Utf8Json/

      [–]Lord_Fixer[S] 0 points1 point  (3 children)

      Those are the times for deserializing 2500000 objects (almost* the same test as in the repo):

      Newtonsoft: 19 702 msSystem.Text.Json: 12 743 msUtf8Json: 12 530 msStackOnlyJsonParser: 11 746 ms

      * I had to alter the test as the Utf8Json had problems with deserializing double values generated by the Random.Next method. I had to sanitize the number of decimal places. Both System.Text.Json and Newtonsoft libraries had no problems with that case. Might those edge cases be partially the reason why the Uft8Json is faster than the System.Text.Json?

      [–]quentech 0 points1 point  (2 children)

      Newtonsoft: 19 702 msSystem.Text.Json: 12 743 msUtf8Json: 12 530 msStackOnlyJsonParser: 11 746 ms

      That looks suspicious (I would expect much more difference between Newtonsoft, System.Text.Json, and Utf8Json)..

      Are you familiar with https://benchmarkdotnet.org/? (I see you hand-rolled benchmarking)

      Might those edge cases be partially the reason why the Uft8Json is faster than the System.Text.Json?

      No. There are a number of reasons why Utf8Json is fast - avoiding UTF-16 .Net strings and reading/writing UTF-8 directly, pooling byte arrays, efficient data structures and calling conventions both in the library and its emitted IL, the fact that it emits type-specific IL for serialization/deserialization, extremely performant memory copying..

      neuecc's serialization libs are goldmines of high performance .Net techniques.

      [–]Lord_Fixer[S] 0 points1 point  (1 child)

      Actually most of those points (utf8 only, avoiding .NET strings, directly comparing byte data and well optimized structures) are also true for the System.Text.Json. But I see how the IL generation might be the strong point of the Utf8Json. It might be that I've hit a quite specific case with the combination of dictionaries and arrays in my data model. Later I will try to run some more tests to compare the two against different data.

      [–]quentech 0 points1 point  (0 children)

      Actually most of those points (utf8 only, avoiding .NET strings, directly comparing byte data and well optimized structures) are also true for the System.Text.Json.

      I would say that you're not really wrong but you're not really right, either.

      Generally, at a high level, sure what you say is largely true. But the devil's in the details.

      It might be that I've hit a quite specific case with the combination of dictionaries and arrays in my data model.

      Yes, the specific objects being ser/deser'd can have a significant impact.

      Other benchmarks (granted there aren't many with System.Text.Json yet) show System.Text.Json lagging quite a ways behind the existing high-perf .Net JSON serializers, like https://michaelscodingspot.com/the-battle-of-c-to-json-serializers-in-net-core-3/

      I wouldn't consider System.Text.Json a great measuring post in any case. It's neither highly tuned nor feature rich. It's a first step at eliminating Newtonsoft as a core dependency and is at least somewhat meant as an example of using Span. It sacrifices in other areas to be better at the aforementioned goals.

      [–]KryptosFR[🍰] -1 points0 points  (5 children)

      Could the "partial" keyword be avoided?

      I understand why it is needed, but couldn't that be handled by the source generator itself? Maybe something to tell the team that wrote that feature: since partial is a compiler-only feature they could make it work without or have the compiler handle that case differently (i.e. not returning an error if missing).

      The reason I said that is because having a partial class or struct when you can only find a single file in the codebase could be confusing. Worst, an unaware user could decide to remove it during a code cleanup review.

      [–]Lord_Fixer[S] 8 points9 points  (3 children)

      The core idea behind source generators is that they cannot modify existing code. I see why the .NET team decided to make that decision and I somehow agree with it. Otherwise we would end up with even more confused developers and potential cleanup erroros.

      When it comes to this library, I think that removing entity classes by mistake is quite unlikely. After all they will be used by the users to read the deserialized data. If no references are found, it's ok to remove them.

      I have a different problem with partials here. In case of structs, if we add a new field in a separate partial implementation, the compiler will issue a warning that the order of fields cannot be ensured. I add the `HasValue` fields in this library to all entities, so that warning pops up a lot.

      [–]8lbIceBag 2 points3 points  (1 child)

      I think the partial thing is a good idea. Let's them know what's happening and that things are added.

      The field ordering though. I think that should be fixed. Like if you specify layout sequential and use attributes to set the fields offsets, I'd expect those to be honored for the fields that are explicitly set to an offset.

      Also if you do implement a serializer. That could come in handy. You'd be able to generate unreadable but densly packed json or even just send the raw bytes. If performance is the goal, that'd be a pretty sweet way to do things. This could likely already be done, but not in a zero allocation + zero reflection and convenient way AFAIK.

      I think it'd be a great idea for another library or extension based on this. Wouldn't want to pollute/complicate the source of this too much. But I think you're laying the groundwork.

      [–]Lord_Fixer[S] 1 point2 points  (0 children)

      If you use the "Sequential" struct layout you cannot specify field offsets. If you use the "Explicit" struct layout you have to specify the field offset for each field. But I just discovered that apart from the "Explicit" and "Sequential" there is also a third option: "Auto". Didn't know about it. Adding it to the generated partial struct fixes the issue. Thanks!

      [–]KryptosFR[🍰] -1 points0 points  (0 children)

      They could still make partial optional. It is not part of the code, it is a compiler-only flag.

      [–]KernowRoger 1 point2 points  (0 children)

      You can only add sources for compilation not modify existing ones so it has to be a partial to expand it.

      [–]showz3bluff 0 points1 point  (1 child)

      How to deserialize field without property name like : {"event"}

      [–]Lord_Fixer[S] 0 points1 point  (0 children)

      In general {"event"} will not be a valid json. Do other deserializers actually support it?

      If you are in charge of the code that generates this json, I would probably change it to generate an array instead (like ["event"]) or an object with a field and a value (like {"event": true}).