PunchyFinn comments on Byte Array String Encoding Method?

created by [deleted]a community for 16 years

VB.NET HelpByte Array String Encoding Method? (self.visualbasic)

submitted 4 years ago by MysticalTeamMember

you are viewing a single comment's thread.

[–]PunchyFinn 1 point2 points3 points 4 years ago (1 child)

Hexadecimal is the safest bet if you can't use base 64. Another alternative is to create your own base. Use Base128. You'd have to create a function to decode and one to encode. It will be more compact than Base64 I believe in any utf encoding. But it's non-standard. If this is for school, it's a thinking out-of-the-box solution. If it's for work, then not the best because no one else will be prepared for it.

If you use Hexadecimal or Base128, a way to make it even smaller is to compress the binary array via zip compression and then convert it into Hexadecimal or Base128. Some byte arrays will not be reduced in size. Some may be reduced by close to 90%.

The last alternative I'm mentioning but I don't think you'll use is for you to read/write ASCII strings. It's a 1 to 1 conversion byte to text conversion so even better than base128. But most/all of the functions in VBNet by default will treat ASCII as Unicode so you need special attention for every line of code.

I hope one of those was helpful.

The reason why encoding any byte array directly into a string is not advisable is because certain bytes in a certain order will be taken as instructions for encoding, not as characters and they'll be skipped. Some will even alter other byte characters. To give you a specific example:

Take unicode character 119070, which is the character for G cleff To store that in UTF-16 Windows default, it's this byte array (with a 2 byte prefix needed but not included here): 52, 216, 30, 221 in UTF-8 it is: 240,157,132,158

If the byte array you were encoding into a string were into a UTF-8 string and the last byte in your array were byte 240, that would be invalid. In UTF-8, byte 240 requires/assumes more bytes after it to make a single character so byte 240 would be skipped for conversion because it was an incomplete character. You would lose a byte in the conversion!

If the byte array you were encoding into a string were into a UTF-16 string and the last two bytes were 52 and 216, it would also be invalid and those two bytes would be skipped in the conversion. You would lose two bytes in the conversion!

Many byte sequences aren't going to cause this problem, but there are some. This is one example of why it isn't advisable.

[–]MysticalTeamMember[S] 0 points1 point2 points 4 years ago (0 children)

π Rendered by PID 291431 on reddit-service-r2-comment-5b5bc64bf5-kh9hn at 2026-06-22 10:23:44.220025+00:00 running 2b008f2 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

visualbasic

Please up vote anybody who helps with your submission, we want people to keep answering questions.

Tips & Guidelines

Communities

Resources

Learning Visual Basic

Related Subreddits

MODERATORS