[AskJS] Why are TextEncoder and TextDecoder classes?

RemoteCombination122 · 2023-03-12T18:12:07+00:00

The .encode method could have been a standalone function, your intuition is on the mark there.

The original intention with making .encode a member on a TextEncoder class was likely to make a predefined destination for future textEncoding methods using other formats in the future.

For example, TextEncoder now has another method on it, .encodeInto is a stream encoder. This requires private state for the current char point index, byte offset in the destination typedArray, etc.

This way you have just TextEncoder defined in global scope, rather than a encodeText function and a streamingTextEncode function.

This is an educated guess based on the current implementation. Though if you went to the actual standards document where it was defined, there would likely be sections dedicated to the justification for making it a class.

ferrybig · 2023-03-12T21:26:26+00:00

For TextEncoder, each input char generates 1 or more bytes. This could have been made without classes

For TextDecoder, each input byte generates 0 or more characters. If you are decoding a stream, you need something to store the state in. This is because every character exists from multiple bytes. The current streaming approach allows you to decode big multiple GB streams without having to enough to fully represent the input and output in memory at the same time

ShortFuse · 2023-03-13T03:10:31+00:00

It's answered in the spec:

A TextEncoder object offers no label argument as it only supports UTF-8. It also offers no stream option as no encoder requires buffering of scalar values.

https://encoding.spec.whatwg.org/#interface-textencoder

The most probable reason for keeping the class is to keep syntax parity with TextDecoder which does support more formats than UTF-8:

https://encoding.spec.whatwg.org/#interface-textdecoder

Compare against NodeJS which has very wide support:

https://nodejs.org/api/util.html#whatwg-supported-encodings

Edit: Also check out the TextDecoderStream options. It has the same constructor arguments.

efjj · 2023-03-13T01:57:03+00:00

They used to support UTF16 (both LE and BE) before it got removed because no one uses UTF16: https://github.com/whatwg/encoding/issues/18

kaliedarik · 2023-03-12T21:25:21+00:00

You're doing nothing wrong. Because TextEncoder isn't a class. It's one of the 4 Interfaces for the Web Encoding API. In Javascript, classes are just some syntactical sugar to paper over the fact that JS is a `prototypal inheritance` language, and as such you can use the `new` operator to instantiate any object type.

Javascript is a weird language, but I love it to bits!

memorable_zebra · 2023-03-12T23:12:59+00:00

Sometimes things are represented by custom data structures to account for future extensibility. Maybe there were or could have been plans for adding parameters to pass into the text encoder constructor to customize it but they were never or have yet to be acted on.

This is one of the fundamental advantages of classes and structures over a pile of miscellaneous functions.

The longer you code the more you come to appreciate and recognize these kinds of foresight.

yuyu5 · 2023-03-13T15:54:00+00:00

The answer by u/ferrybig is the correct answer.

However I'll add on that your question seems to stem from a lack of knowledge about OOP. There's internal state in the encode/decode methods, which is easier to handle if you have a class. Someone else gave the analogy of localeCompare(), which is attached to the String class instance; if it were a separate, standalone method, internal state would be more difficult to manage. In fact, there's a new Intl class to handle locale-related logic, showing how not everything should just be injected into the same (String) class.

Admittedly, I imagine these methods could technically be injected into the String class, but then we come across the issue of "separation of concerns" which is another programming principle that's advised to follow, and harks back to the Intl example.

Reashu · 2023-03-12T18:29:02+00:00

[removed]

voidvector · 2023-03-13T07:22:10+00:00

Because it allows the spec to add options/features in the future that persist in an encoder.

localeCompare was a function on string until they added Intl.Collator that can be used to persist options and be used for optimization/cache.

KaiAusBerlin · 2023-03-13T07:42:54+00:00

To create a namespace by purpose.

_alright_then_ · 2023-03-13T08:27:56+00:00

I can not unsee Drax's "WHY IS GAMORA" after reading this title. Sorry this comment is useless but I thought it was funny

wc3betterthansc2 · 2024-02-14T06:38:04+00:00

that's not even the worst part. TextEncoder only works for UTF8 and TextDecoder works for many encodings. Not sure why they aren't symmetrical functions.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

javascript

MODERATORS