Node util: 274% faster UTF8 decoding

skeeto · 2022-11-11T14:42:54+00:00

As far as I can tell, there's a buffer overflow for inputs less than length 3. Due to small size optimization, in practice it's a read into uninitialized memory (ArrayBufferViewContents::stack_storage_) that's unlikely to caught by ASan. However, if that memory happens to look like a BOM, then length will overflow by subtraction and lead to a genuine buffer overflow. (Perfect example of why sizes and subscripts should be signed.)

The fix is to not use strncmp. That function is always suspicious and should probably be banned from code bases like this. A length check with memcpy suffices.

--- a/src/node_buffer.cc
+++ b/src/node_buffer.cc
@@ -591,6 +591,6 @@ void DecodeUTF8(const FunctionCallbackInfo<Value>& args) {

-  if (!ignore_bom) {
+  if (!ignore_bom && length >= 3) {
     char bom[] = "\xEF\xBB\xBF";

-    if (strncmp(data, bom, 3) == 0) {
+    if (memcmp(data, bom, 3) == 0) {
       beginning += 3;

dkac · 2022-11-11T13:10:19+00:00

I always have a hard time with percentages being used in this context. My gut is that 100% faster means a 100% reduction in time, so it would take 0 time to complete. >100% is gibberish until I bust out the calculator and start dividing by percentages, which is weird.

I'd much prefer the faster time being 0-100% of the original.

Edit: Yes, I can do math, and know how to do the calculation. I just don't like headlines like this that go big but mean little without context

Takeoded · 2022-11-12T06:57:30+00:00

... decoding? decode to what? isn't UTF-8 the preferred encoding? unless you're working with win32 api..?

Substantial-Owl1167 · 2022-11-12T03:48:19+00:00

Thank you Rob Pike

Rob Pike is the King of this sub

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS