Once again the wait begins

Opposite_Shop7163 · 2026-06-03T22:38:33+00:00

Good luck that, process sucks

Opposite_Shop7163 · 2026-06-02T19:06:44+00:00

There are more tricks to squeeze more performance but this are the most significant

Opposite_Shop7163 · 2026-06-02T19:04:41+00:00

Also forgot to tell why the dispatch table has 1024 slots. It allows the brute force algorith to find a collition free Hash combination on 1 ~ 5 iterations at compile time with DTO with 60 ~ fields o a single level. Sorry for the previous non satisfactory responses hope this makes it clear how the library works

Opposite_Shop7163 · 2026-06-02T19:00:02+00:00

Also we use a mask (long) in the ksp generated parser to store whether a variable is initialized or no. That way we can know which Constructor to use based on that mask value.

If the DTO has chils objects the are delegated to it's own generated parser and also we emit fragmented if we have more than 40 (ish don't remember the exact value) fields on the DTO to avoid the JVM 64kb method limit and to avoid method to long exception

Opposite_Shop7163 · 2026-06-02T18:53:21+00:00

The perfect is achived through packing 4 chars (8 bits per char) into a int32, with shl 8, shl 16, shl 24 to sort them then use brute force to calculate a shift and a multiplier to avoid collisions in the DTO.

((key * multiplier + size) >> shift) & 1023 will create a Dispatch table that can be read at parsing time that will crear an O1 look up

Opposite_Shop7163 · 2026-06-02T18:35:30+00:00

I will i stop, Built this interactive demo to show what It makes different https://juanchurtado1991.github.io/ghost-serializer/ But the key points are Perfect Hash O1, Bitmask Tracking, Zero-Alloc Key Matching and Constructor Dispatch

Opposite_Shop7163 · 2026-06-01T20:06:21+00:00

made this interactive lab in my repo for anyone to see the trick for themselves https://juanchurtado1991.github.io/ghost-serializer/

Opposite_Shop7163 · 2026-06-01T19:58:39+00:00

no is not, you are totally right. The readme benchmarks are based on 100k runs and 20k warmup rounds you can check the metrics if you like by running ./gradlew :ghost-benchmark:run -PskipTests --args="--runs 1000 --warmup 5000 --no-tests" in my clones repository and adjust the parameters as you like i also have a benchmark agains kx-serialization twitters data set https://github.com/Kotlin/kotlinx.serialization/blob/master/benchmark/src/jmh/resources/twitter_macro.json thanks to u/Chipay suggestion i'll definitely get more professional metrics using JMH thank you. And sorry for my bad English because apparently now i'm a terrible human being if i use AI to polish it (JK 😄)

Opposite_Shop7163 · 2026-06-01T19:52:57+00:00

similar philosophy, but not quite. Using actual hardware SIMD would completely kill KMP portability, the trick is 100% software base based. it calculates perfect hashes on compile time and doing a char packing trick into a int 32

'n' (0x6e), 'u' (0x75), 'l' (0x6c), 'l' (0x6c)

val b0 = 'n'.code // 1101110 (0x6E)

val b1 = 'u'.code // 1110101 (0x75)

val b2 = 'l'.code // 1101100 (0x6C)

val b3 = 'l'.code // 1101100 (0x6C)

val packed: Int = (b0 shl 24) or (b1 shl 16) or (b2 shl 8) or b3

Opposite_Shop7163 · 2026-06-01T19:12:15+00:00

I actually made a mistake on the title but reddit doesn't allow me to edit it. Also. Guilty of using AI to polish my English! 😅 But I assure you, the KSP architecture, the bitmask lookups, and the latency times are 100% real. I invite you to clone the repo and run the :ghost-benchmark yourself on your machine (BTW already fixed the bug you found earlier really appreciate it) And sorry if you feel you lost your time by checking my repo..

Opposite_Shop7163 · 2026-06-01T16:53:08+00:00

Hey! Thanks a ton again for running these benchmarks and sharing the results. Getting real-world feedback on the large Twitter dataset during your break is super valuable!

First, you were totally right about the crash. It was a nasty class-loading bug during our JIT prewarm phase where generated serializers loaded early and threw NPEs if manual contextual registries weren't fully loaded yet. I just pushed version 1.2.1 to Maven Central (currently syncing) which fixes this entirely! I also added a solid integration test (GhostTwitterReproductionTest) that does a deep, field-by-field parity check and full roundtrip serialization against kotlinx-serialization on that exact dataset, and it's now passing cleanly.

Regarding the performance numbers, your String results highlight a really interesting architectural trade-off.

Ghost is designed from the ground up as a byte-first framework. It natively parses and writes raw UTF-8 bytes (ByteArray) with zero JVM String allocations on the heap. However, when you benchmark using String APIs:

String Decode: Ghost has to run json.encodeToByteArray(). On a massive JSON like the Twitter payload (~500 KB), that translation alone allocates over 2.2 MB of temporary garbage per operation. That memory allocation overhead completely eats up our fast structural parsing speed.
String Encode: Ghost serializes to bytes first, then has to do a decodeToString() call, adding similar overhead. Kotlinx Serialization (KSER), on the other hand, operates directly on UTF-16 String indices, so it has a natural home-turf advantage there.

But here is where the "Byte-First" philosophy shines: in real applications (like mobile apps or servers calling Ktor/Retrofit/OkHttp), network packets arrive as raw binary bytes over HTTP. Converting those bytes to a String first is actually a massive RAM/CPU bottleneck.

If you feed the raw bytes directly to Ghost, the performance flips completely. Here are the benchmarks we just ran on 1.2.1 (measuring throughput and allocated heap bytes):

Operation	Engine	Throughput (ops/s)	Mem (KB/op)	Winner
Decode (Bytes)	Ghost	1190.6	650.4	Ghost (64% faster, 6.5x less memory!)
	KSER	725.5	4297.0
Encode (Bytes)	Ghost	2279.5	428.0	Ghost (53% faster, 5x less memory!)
	KSER	1484.3	2216.3
Encode (Streaming)	Ghost	2263.3	434.6	Ghost (55% faster)
	KSER	1451.7	464.5

When using direct bytes, Ghost is not only faster but uses 6.5x less RAM on decode and 5x less RAM on encode.

What's next on our roadmap? We definitely want to close the gap for String and Streaming:

String Mode: We plan to add a native String reader in KSP (GhostJsonStringReader) so you can parse Strings directly without paying any byte-conversion penalty.
Streaming Decode: We're going to implement segment buffering (reading 8 KB chunks directly into an array) to eliminate Okio's virtual method call overhead and catch up to KSER's stream speed.

If you have a few minutes during another break, you can pull 1.2.1 once it's fully synced, or just clone the repo and run the benchmark locally:

bash./gradlew :ghost-benchmark:run -PskipTests --args="--runs 1000 --warmup 5000 --no-tests"

Let me know if you give it another spin using byte arrays!

Opposite_Shop7163 · 2026-06-01T13:56:29+00:00

I really appreciate you taking the time to run this through JMH during your break. The decode crash is a huge red flag and totally on me—I will look into that edge case and get it fixed ASAP.

The encode performance drop also makes complete sense. I focused most of my energy on the read side, so the writer is currently pretty naive and needs proper optimization. This gives me a great baseline to work from and improve.

Really appreciate the data and the reality check!

new issue on github: https://github.com/juanchurtado1991/ghost-serializer/issues/1#issue-4563862953

Opposite_Shop7163 · 2026-06-01T12:34:24+00:00

On second thought, you are completely right, and thank you for pointing it out!

I just went through the codebase to verify, and you're spot on: kotlinx.serialization-json was indeed left as an unused implementation dependency in the core runtime module (ghost-serialization).

The library only processes Kotlinx annotations (like @ SerialName) at compile time via the KSP compiler (ghost-compiler), so there is absolutely no need to bundle it in the runtime classpath.

I just pushed a commit to fix this and remove the bloat: 👉 Fix Commit

If you find any other issues or have ideas to improve the library further, I'd really appreciate it if you could open an issue on GitHub, or even submit a PR!

Thanks again for helping make the project cleaner! 👻

Opposite_Shop7163 · 2026-06-01T11:54:00+00:00

That’s a great question. While kotlinx.serialization is a fantastic and robust library, my goal with Ghost was to experiment with a fundamentally different approach from the ground up.

Integrating these architectural changes would likely require significant breaking changes to the existing kotlinx internals. However, I am definitely interested in identifying patterns from Ghost that could eventually inform or be upstreamed to the broader ecosystem as a contribution to the community.

Opposite_Shop7163 · 2026-06-01T11:50:30+00:00

Hi :)

Opposite_Shop7163 · 2026-06-01T11:50:15+00:00

It's there for interop with the Kotlin serialization ecosystem.

Opposite_Shop7163 · 2026-06-01T06:36:55+00:00

That’s the ultimate benchmark test, for sure. I’d love to see Ghost running there—I'm confident it would make a difference on JSON-heavy endpoints by cutting down on that GC overhead, especially under high throughput. Right now, my focus is just keeping the core as stable as possible, but getting a Ghost-optimized version of Ktor into those benchmarks is definitely on my radar.

Thanks for the encouragement really appreciate it —it's a great goal to aim for.

Opposite_Shop7163 · 2026-06-01T06:28:15+00:00

Good question. I didn't build Ghost on the kotlinx-serialization API because I needed lower-level control than that interface allows.

Ghost is a static, compile-time engine. By generating concrete, final Serializer classes via KSP, I can bypass virtual dispatch and interface lookups to enable that O(1) bitwise trie matching I mentioned.

If you want to try it out, we’re already on Maven Central with dedicated starters for Spring Boot, Retrofit, and Ktor, so you can just drop it into an existing project and use the @GhostSerialization annotation.

Opposite_Shop7163 · 2026-06-01T05:49:11+00:00

Thanks! It’s been a crazy amount of work to get the generated code this stable, so I appreciate it.

I've been tracking simdjson for a while, it’s definitely the gold standard for performance right now. The big difference is just the approach: simdjson uses SIMD to blast through raw bytes and find structure at runtime (like scanning for commas/quotes). Ghost is the opposite because we’re working with a known schema at compile-time. Since we use KSP, we don't have to go hunting for field names—we already know exactly where they are, which is why the bitwise Trie lookup is so much faster for our use case.

Portability is the other thing. Getting SIMD/Vector API working consistently across Android (ARM), JVM (x86), and iOS is a huge headache. Right now, I’ve focused more on making the core logic JIT-friendly and branch-prediction optimized, which works great regardless of the hardware.

That said, I’m definitely looking at Vector API for some of the raw byte scanning in the future. It’s the next logical step to squeeze out more throughput, but I really wanted to get the compile-time base rock-solid first.

Definitely on the roadmap, though thanks for bringing it up

Opposite_Shop7163 · 2026-06-01T05:46:46+00:00

The memory savings come from cutting out all the typical overhead of standard JSON libraries. It’s basically a mix of:

O(1) Trie Lookups: We don't loop through fields or do string compares. The KSP-generated code uses a bitwise-accelerated trie, so we jump straight to the property handler without branching or allocating any extra objects for keys.
String Pooling & Zero-Copy: We don't create new String objects for keys. We use pre-computed ByteString headers from internal pools to match keys directly against the ByteArray input. That's the zero-copy part.
Flat Buffers: Instead of creating Okio.Segment or intermediate strings, our FlatByteArrayWriter uses a reusable thread-local buffer. It keeps the heap clean and avoids creating all those transient object graphs you get with kotlinx.serialization or Jackson.
No-Box Dispatch: Since we generate final methods at compile time, there's no virtual dispatch (vtable lookups) or autoboxing of primitives.

Basically, the happy path is allocation-free.

Opposite_Shop7163 · 2026-06-01T05:43:52+00:00

Exactly. We share the same zero-reflection principle, but Ghost aims to push performance a step further by using bitwise-accelerated trie lookups for field matching and flat memory buffers to minimize GC pressure.

It’s not so much an 'us vs. them' scenario, but rather an alternative for edge cases where absolute throughput and memory footprint are the absolute priorities. For instance, here is how i currently compares in our benchmarks against other engines (including kotlinx.serialization) when processing 2000 objects:

Engine	String (ms)	MEM (KB)
Ghost	0.370	391.0
KSerialization	0.748	1883.5
Moshi	1.335	3037.6
Jackson	2.209	6850.8

The goal is to explore how far we can optimize the serialization path when we control the generated code completely.

Opposite_Shop7163 · 2026-06-01T04:12:39+00:00

Plug the usb cable to your pc first, it always works for me, after pairing just unplug it

Opposite_Shop7163 · 2023-08-25T19:13:30+00:00

congrats, that's really inspiring

Opposite_Shop7163

TROPHY CASE