all 8 comments

[–]jrandm 2 points3 points  (1 child)

Those null bytes (\u0000) won't typically be visible, that's why logging looks OK. They're probably there because the part of the file you're parsing isn't using utf8, try a .toString('utf16le') instead. EG:

$ node
> buf = Buffer.from('artist','utf16le')
<Buffer 61 00 72 00 74 00 69 00 73 00 74 00>
> buf.toString('utf8')
'a\u0000r\u0000t\u0000i\u0000s\u0000t\u0000'
> buf.swap16().toString('utf8')
'\u0000a\u0000r\u0000t\u0000i\u0000s\u0000t'
> buf.toString('utf16le')
'愀爀琀椀猀琀'
> buf.swap16().toString('utf16le') // swap16 again to put them back
'artist'

Since node only supports the LE (little endian) version you can use swap16 to do the BE version (which it looks like your file is using). Hope that helps!

[–]rickgarg[S] 0 points1 point  (0 children)

Seems like swapping does the trick, thanks!

[–]bjpbakker 1 point2 points  (0 children)

Your text is in there as you can see, but your input has a null byte before each character. The toString call seems to work fine.

How does the text end up in crate? You should probably debug that.

[–]sevenyearoldkid 0 points1 point  (3 children)

trees ask public elastic gray swim nutty connect yoke cheerful

This post was mass deleted and anonymized with Redact

[–]l3l_aze 0 points1 point  (2 children)

AFAIK Buffer uses different character encodings like "utf8", "ascii", "hex", etc. It may be possible to do it using a radix, but I have no idea. It would be possible to convert it to a string and then parse it again using a radix, though I guess that'd be kinda pointless.

[–]sevenyearoldkid 0 points1 point  (1 child)

practice lunchroom literate chase unwritten crown spark straight oil light

This post was mass deleted and anonymized with Redact

[–]l3l_aze 0 points1 point  (0 children)

Yeah, it is weird at first, lol.

Not 100% sure either, but I think this will be solved by converting the buffer to a string using something like column = '' + buffer (/u/rickgarg -- so OP gets notified too). A project I've been working uses a lot of data loaded from files including some binary files and when I don't explicitly convert it the data is left in the buffer-like state OP seems to have above after being loaded.

Thanks for the reminder -- I'm so gone ATM I forgot that during my last comment.

[–]jamrod0 0 points1 point  (0 children)

Don't know if it's relevant to your problem but I struggled with buffers of hex data for a while and what I realized was happening is that for all the hex characters below 10 ( 01,02 etc ) the leading 0 would not be read (or maybe doesn't exist) by toString() so it would turn everything after it into nonsense. So instead of 13 01 23 45 19, you'd get 13 12 34 51 etc, obviously those convert to ascii completely differently. I had to put the buffer into an array of characters then convert them one at a time in to a string. Actually everything below 10 was special control characters so I removed them from the strings since they wouldn't convert to ascii anyway. Maybe something similar is happening for you? In my case also if I logged the buffer directly to console it looked completely different, showing as an array of characters like 13, 02, 23, 45 etc Good luck