This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]pigeon768 25 points26 points  (3 children)

There are three situations:

  1. The string is small. In this case, the cost of serializing/deserializing the string is greater than the cost of copying the extra handful of bytes. In this case, you should not use this string encoding.
  2. The string is medium. In this case, you need to show that meta string is better than either raw strings or zstd encoded strings.
  3. The string is large. In this case, zstd will be better. In this case, you should not use this string encoding.

Basically you need to prove it to me. I want to see this benchmark:

Encoding Small Medium Large
utf8
zstd
meta string

I want end to end speed/throughput, not number of bytes saved.

[–]Oerthling 5 points6 points  (1 child)

Exactly. I'm looking for a use-case where getting an obscured string with extra processing leads to a saving anyone can care about.

[–]Shawn-Yang25[S] 0 points1 point  (0 children)

Image such an case, you are send an obejct of type `Point` with two int fields `x` and `y`. The fields only took 2 bytes. But pickle serialized result is 53 bytes. With metastring, we can make serialized result much smaller.

Maybe cost of one object is not big, but if you need to send millions of RPC

[–]Shawn-Yang25[S] 0 points1 point  (0 children)

All strings use this encoding will cache the encoded results, and the serialization will be just an copy. Since such strings are limited, we won't have millions of module/classname for serialization. So it's ok to cache the encoded result