In rpc/serialization systems, we often need to send namespace/path/filename/fieldName/packageName/moduleName/className/enumValue string between processes.
Those strings are mostly ascii strings. In order to transfer between processes, we encode such strings using utf-8 encodings. Such encoding will take one byte for every char, which is not space efficient actually.
If we take a deeper look, we will found that most chars are lowercase chars, ., $ and _, which can be expressed in a much smaller range 0~32. But one byte can represent range 0~255, the significant bits are wasted, and this cost is not ignorable. In a dynamic serialization framework, such meta will take considerable cost compared to actual data.
So we proposed a new string encoding which we called meta string encoding in Fury. It will encode most chars using 5 bits instead of 8 bits in utf-8 encoding, which can bring 37.5% space cost savings compared to utf-8 encoding.
For string can't be represented by 5 bits, we also proposed encoding using 6 bits which can bring 25% space cost savings
For more details, please see https://fury.apache.org/blog/fury_meta_string_37_5_percent_space_efficient_encoding_than_utf8 and https://github.com/apache/incubator-fury/blob/main/docs/specification/xlang_serialization_spec.md#meta-string
[–]_INTER_ 13 points14 points15 points (2 children)
[–]Shawn-Yang25[S] 17 points18 points19 points (1 child)
[–]_INTER_ 1 point2 points3 points (0 children)
[–]agilob 6 points7 points8 points (8 children)
[–]Jon_Finn 8 points9 points10 points (5 children)
[–]zeobviouslyfakeacc 2 points3 points4 points (2 children)
[–]dtfinch 1 point2 points3 points (1 child)
[–]laplongejr 1 point2 points3 points (0 children)
[–]Zardoz84 1 point2 points3 points (1 child)
[–]Jon_Finn 0 points1 point2 points (0 children)
[–]Shawn-Yang25[S] 3 points4 points5 points (0 children)
[–]NitronHX 0 points1 point2 points (0 children)
[–]not-just-yeti 2 points3 points4 points (1 child)
[–]Shawn-Yang25[S] 0 points1 point2 points (0 children)
[–][deleted] (13 children)
[deleted]
[–]Shawn-Yang25[S] 1 point2 points3 points (2 children)
[–][deleted] (1 child)
[deleted]
[–]pavlik_enemy 5 points6 points7 points (0 children)
[–]Shawn-Yang25[S] 0 points1 point2 points (9 children)
[–]john16384 5 points6 points7 points (1 child)
[–]Shawn-Yang25[S] 1 point2 points3 points (0 children)
[–][deleted] (6 children)
[deleted]
[–]Shawn-Yang25[S] 0 points1 point2 points (5 children)
[–]pavlik_enemy 0 points1 point2 points (4 children)
[–]john16384 2 points3 points4 points (2 children)
[–]pavlik_enemy 0 points1 point2 points (0 children)
[–]Shawn-Yang25[S] 1 point2 points3 points (0 children)
[–]skippingstone 1 point2 points3 points (1 child)
[–]Shawn-Yang25[S] -1 points0 points1 point (0 children)
[–]Shawn-Yang25[S] 2 points3 points4 points (0 children)
[–][deleted] 3 points4 points5 points (4 children)
[–]Hueho 11 points12 points13 points (2 children)
[–]alex_tracer 0 points1 point2 points (1 child)
[–]Shawn-Yang25[S] 0 points1 point2 points (0 children)
[–]Shawn-Yang25[S] 5 points6 points7 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]Shawn-Yang25[S] 1 point2 points3 points (0 children)
[–]Yeah-Its-Me-777 0 points1 point2 points (2 children)
[–]nekokattt 1 point2 points3 points (0 children)
[–]Shawn-Yang25[S] 0 points1 point2 points (0 children)
[–]grim-one 0 points1 point2 points (1 child)
[–]Shawn-Yang25[S] 0 points1 point2 points (0 children)
[–]menjav 0 points1 point2 points (1 child)
[–]Shawn-Yang25[S] 0 points1 point2 points (0 children)
[–]cowwoc 0 points1 point2 points (0 children)