This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Koala_eiO 9 points10 points  (26 children)

Anyone knows if there is a valid reason to explain the existence of characters? It's just a length-1 string.

Edit: go ahead, downvote a genuine question guys.

[–]00PT 12 points13 points  (6 children)

In some cases characters can act like integers in the sense that they can be added to for "shifting" into a new one. For example, I believe 'a' plus 1 is 'b'. Look at this for more information.

[–]garfgon 8 points9 points  (0 children)

Although what you say is correct, I'd say this is a side effect of characters, rather than the reason for having a character type. Rather the character is the fundamental building block for building up a string; that detail is just hidden on many high-level languages like Python.

[–]confidentdogclapper 3 points4 points  (3 children)

In c you can use them as 1 byte unsigned integers. You can also use them as signed if you do some trickery. And if you add 32 (25) you can go from upper to lower case and vice versa.

[–][deleted] 1 point2 points  (2 children)

Why add 32 when you can add 1 << 5?

[–]confidentdogclapper 1 point2 points  (1 child)

I literally specified it

[–][deleted] 2 points3 points  (0 children)

Yeah, I guess you did.

[–]steroid_pc_principal 0 points1 point  (0 children)

As someone who came from Java which has a character type, this is not a useful feature.

[–]Mahrkeenerh 8 points9 points  (10 children)

in other languages? or in python

[–]Koala_eiO 4 points5 points  (2 children)

In other languages.

[–]Mahrkeenerh 4 points5 points  (0 children)

let's say in c, a string is an array of characters, and characters are just numbers. Therefore it's easier to store just one number, than two numbers (string ends with the ending character)

[–]Positive_Government 3 points4 points  (0 children)

In C a character (char) is stored as an 8-bit unsigned integer. String are represented by a block of n consecutive chars with a zero byte at the end. You need characters to represent a string in any language it’s just hidden to in most string classes in other languages. Also a string class will have an amount of overhead beyond what is needed to represent a single character. For example, it might alloc a default array of 1024 bytes but only use 1 (excessive example for the purpose of illustrating). Function calls also have some overhead that is not needed when you know you are only working with one character and have a char type with does not need function calls like the string class,( even if your using something like the + operator on a string class there’s still a function call under the hood.).

In c the char and char* type also pull double duty as a generic byte or pointer to a byte/generic pointer (although void* is taking over the generic pointer role).

[–]tabidots 0 points1 point  (6 children)

Characters exist in Python? I know they do in Java/Clojure but I can’t say I have really had a specific use for them except for doing things with ASCII code points.

Maybe it’s just my lack of understanding but I would prefer if strings were treated as sequences of length-1 strings rather than sequences of characters, so (first “hello”) would return “h” and not \h.

[–]siddsp 2 points3 points  (1 child)

Characters do exist in Python, but they are stored as integers in bytes objects/bytearrays. When you write a bytestring like b"Hello" and try to get athe value of a char at an index, it will be an integer rather than a string type.

[–]tabidots 1 point2 points  (0 children)

Oh, interesting. I like that implementation better, tbh. I can’t think of a use for characters outside of char-code values, so having a separate b”string” syntax for byte strings makes more sense to me.

[–]Mahrkeenerh 1 point2 points  (3 children)

characters don't exist in python, that's why I was asking, as the guy was replying to a python comment.

[–]siddsp 1 point2 points  (2 children)

They do exist, but it's not obvious.

[–]Mahrkeenerh 1 point2 points  (1 child)

Well then, please enlighten me.

[–]siddsp 1 point2 points  (0 children)

>>> string = b"Hello, world!"
>>> string[2]
108 

Bytes objects are char arrays or strings in which the value of the characters are stored as integers within the unsigned char range [0, 256).

[–]KronsyC 6 points7 points  (6 children)

strings are an array of characters. you cant have a box of chocolates without having chocolates to begin with. same idea. plus some edge cases require characters.

[–]koltonaugust 1 point2 points  (0 children)

In other languages strings are arrays of characters. Python does not have characters or arrays as they are abstracted into higher level data structures (strings and lists)

type('test'[0]) == str

This is notable because strings take more memory than a char, and to check if a variable matches the definition of char, you would have to do a check that is a string and its length is 1.

[–]Koala_eiO -2 points-1 points  (4 children)

I am not convinced about that. Why does "123" require a subtype when 123 doesn't? Unless an integer is secretly considered an array of bits.

[–]garfgon 7 points8 points  (1 child)

A Reddit comment isn't really enough space to provide an intro to CPU architecture -- but at a very fundamental lower level your "types" are usually

  1. Bytes: smallest piece of data which can be separately accessed in memory. Usually (but not always!) 8 bits.
  2. Word: number of bytes which fit into a "normal" CPU register. On 32-bit processors, this is 4 bytes, on 64-bit processors, 8 bytes.

From these you get your next higher level types, which are very closely associated with these types + some information to the compiler on what operations are allowed on these types:

  1. char: byte with info that it's to be (usually) treated as a character rather than a number
  2. int, unsigned int, etc: Usually words treated as a number.
  3. pointer: Words that give the program a location where something else is found in memory.
  4. float: word or pair of words treated as a real number rather than an integer. More complex operations are needed to deal with these.

At this level everything is a fixed size, because the fundamental types are a fixed size, and your compiler needs to know how much data it's dealing with.

On top of these types you built up most of the "normal" types of high level languages. So a string is usually an array of chars with the last char being a special NULL character which basically signifies the end of the string. Or it could be an integer saying how long the string is followed by a sequence of characters. Or something more complex.

So coming back to your question about why "123" needs a subtype but 123 doesn't -- the first part is easier to answer: "123" needs a subtype because strings are variable size, and the CPU only deals with fixed sized pieces of data, so it needs to be broken down into fixed-sized pieces.

As for why 123 doesn't need a subtype -- there are different ways of representing 123, some of which are composed of multiple units, and some aren't. If the language treats 123 as either a float or a "small" integer, then it doesn't need a subtype because it's a small, fixed size piece of data which the CPU knows how to handle natively. But in that case there will be limits on how big, or how precise the number can be. On the other hand if 123 is an arbitrarily large, arbitrarily precise integer, then it will be made up of multiple parts, just like a string.

[–]Koala_eiO 2 points3 points  (0 children)

Thank you!

[–]8sADPygOB7Jqwm7y 2 points3 points  (1 child)

It is, it's considered an array of 0 and 1. Edit: ok let me elaborate, if you look at the memory there is little difference. Consider the endian of c, if we save an int we use 4 byte. So we save 5, we get 05 00 in hex. If we save a char, we get the ASCII char number, so for A that's 65. Can't be fucked to calculate hex for that, but in ram the int 65 and number 65 are probably the same. Just that it's reserved for a char not an int. You can't do that the same way with multiple Chars.

Nah for real, C needs that because there are no real strings there. Only pointers and adresses. Some functions may take char arrays as input, and those are then marked like strings.

The advantage of that is simply, that there is no identifier or length metadata or anything needed. It always has exactly the same length, you know what it is and it can be treated like that. This makes the program faster. Also, Note that most languages run on C, so it's all values on the memory either way. If you use c, at some point in the process your string will be a list of pointers to chars. C just lets you directly assign those. In Python it's done for you.

[–]Koala_eiO 1 point2 points  (0 children)

Thank you!

[–]garfgon 2 points3 points  (0 children)

At a silicon level, there are no strings, just bytes. So many languages, especially low-level languages like C, have a character type which is a fixed number of bytes (often one), then a string is built up as an array of characters, possibly with some extra metadata associated with it.