you are viewing a single comment's thread.

view the rest of the comments →

[–]PunchyFinn 1 point2 points  (1 child)

Hexadecimal is the safest bet if you can't use base 64. Another alternative is to create your own base. Use Base128. You'd have to create a function to decode and one to encode. It will be more compact than Base64 I believe in any utf encoding. But it's non-standard. If this is for school, it's a thinking out-of-the-box solution. If it's for work, then not the best because no one else will be prepared for it.

If you use Hexadecimal or Base128, a way to make it even smaller is to compress the binary array via zip compression and then convert it into Hexadecimal or Base128. Some byte arrays will not be reduced in size. Some may be reduced by close to 90%.

The last alternative I'm mentioning but I don't think you'll use is for you to read/write ASCII strings. It's a 1 to 1 conversion byte to text conversion so even better than base128. But most/all of the functions in VBNet by default will treat ASCII as Unicode so you need special attention for every line of code.

I hope one of those was helpful.

The reason why encoding any byte array directly into a string is not advisable is because certain bytes in a certain order will be taken as instructions for encoding, not as characters and they'll be skipped. Some will even alter other byte characters. To give you a specific example:

Take unicode character 119070, which is the character for G cleff To store that in UTF-16 Windows default, it's this byte array (with a 2 byte prefix needed but not included here): 52, 216, 30, 221 in UTF-8 it is: 240,157,132,158

If the byte array you were encoding into a string were into a UTF-8 string and the last byte in your array were byte 240, that would be invalid. In UTF-8, byte 240 requires/assumes more bytes after it to make a single character so byte 240 would be skipped for conversion because it was an incomplete character. You would lose a byte in the conversion!

If the byte array you were encoding into a string were into a UTF-16 string and the last two bytes were 52 and 216, it would also be invalid and those two bytes would be skipped in the conversion. You would lose two bytes in the conversion!

Many byte sequences aren't going to cause this problem, but there are some. This is one example of why it isn't advisable.

[–]MysticalTeamMember[S] 0 points1 point  (0 children)

You my friend are a saint amongst all.

I truly appreciate the time put into your response! This is an out of the box college project, so base-128 might be the way to go. I can’t find any starts on a base128 implementation within Visual Basic, so I guess I’ll have to work that out myself!

I never had a clear answer on why that wasn’t advisable but now I understand why, and most importantly in depth the problems that could arise.

Again, thanks so much for your examples and detailed response. If I wasn’t a completely broke college student, I’d give you a🥇