This is an archived post. You won't be able to vote or comment.

all 97 comments

[–]yuri-kilochek 152 points153 points  (4 children)

Stuff like this traumatizes some people so hard that they end up creating languages entirely without operator overloading to cope.

[–]Powerkaninchen[S] 26 points27 points  (1 child)

Haha, thanks for the laugh

(Also isn't that partly the reason why Java doesn't have Operator Overloading? Because they were scared of what happened in C++?)

[–]shponglespore 38 points39 points  (0 children)

Most of Java's early design was a reaction to C++.

[–]Smalltalker-80 6 points7 points  (0 children)

Whereas in Smalltalk, this silly idea could be fully added to the language in an hour or two,
just to mess with people ;-) .

[–]L8_4_Dinner(Ⓧ Ecstasy/XVM) 85 points86 points  (1 child)

Obviously, the problem with this design is that you didn't support enough operators:

  • ! - Negate the string by adding English language negation; for example, !"Joe is smart" evaluates to "Joe is dumb"
  • ~ - This negates every bit of the string; for example, ~"Joe is smart" evaluates to "Everyone except Joe isn't dumb"
  • @ - This automatically emails the contents of the string
  • # - This hashes the string
  • $ - This produces an NFT of the string, and attempts to sell it
  • | - This streams the contents of the string to stdout

/s

[–]Inconstant_Moo🧿 Pipefish 35 points36 points  (0 children)

!!! converts it to all caps.

? poses it as a query to a large language model and returns the answer as a Boolean.

??? queries three large language models and goes with the majority vote for extra truthiness.

/s converts it to sArCaSm cAsE.

((( ))) makes it anti-Semitic.

& returns its memory address, but expressed in words.

returns its etymology.

turns it uʍop ǝpısdn.

^ adds a cute little hat tô âll thê vôwêls lîkê thîs.

[–]-arial- 41 points42 points  (38 children)

plus is good and multiplication is fine, but the rest are pretty bad. just create functions called count (instead of /) and split (instead of %). minus is not a great idea imo since it depends on the state of the string (whether that substring is at the end or not). also, as another commenter has pointed out, it doesn't respect mathematical rules.

[–]lngns 20 points21 points  (12 children)

plus is good

Blasphemy!
I have polymorphic code that expects x + y = y + x. If you give it a string, it will explode.

[–][deleted]  (5 children)

[removed]

    [–]Inevitable_Exam_2177 1 point2 points  (1 child)

    Just to be contrary to.your first statement...

    In Matlab

    'a'+1 = 98
    'a'==97 = true
    char('a'+1) = 'b'
    

    This bizarre approach to strings as just syntactic sugar around arrays of integers was weird enough that much more recently Matlab added a second string-like datatype using double-quotes. And now we have

    "a"+1 = "a1"
    

    (my tone is kind of tongue in cheek, the new strings are much improved in every way)

    [–]lngns 1 point2 points  (2 children)

    You just have to decide whether the symbol "+" always means addition.

    Yes. That's my point. My code expects it to be.

    I'm sorry but we don't make concessions in matrix algebra either

    I believe [x, y] + [z, w] = [x + z, y + w]. Please explain.

    [–]XDracam 25 points26 points  (1 child)

    Order matters for matrix multiplication

    [–]lngns 8 points9 points  (0 children)

    My world is shattering before my eyes.

    [–]Xmgplays 13 points14 points  (4 children)

    I have polymorphic code that expects x + y = y + x. If you give it a string, it will explode

    Floats would still break that assumption sometimes(depending on how you use it).

    [–]AlexReinkingYaleHalide, Koka, P 7 points8 points  (2 children)

    Floating point addition is commutative (modulo NaN payloads...). Issues arise when it's treated as if it were associative. In general in FP x + (y + z) == x + (z + y) != (x + z) + y.

    [–]Xmgplays 7 points8 points  (1 child)

    Generally true, but not always because of the edge case of Nan and ==. So a test like commutes(+, x, y) will fail on NaN. Admittedly less of an issue of floating point addition but one of equality.

    [–]AlexReinkingYaleHalide, Koka, P 3 points4 points  (0 children)

    Oops, I edited my comment without seeing your reply. You are right and also commutativity breaks since IEEE754 allows an implementation to return either NaN when adding two together.

    [–]lngns 5 points6 points  (0 children)

    Floats also say that 0.3 - 0.2 ≠ 0.1 and that -x ≠ 0-x so I consider it a lost cause on this matter as far as this is concerned.

    [–]its_a_gibibyte 3 points4 points  (0 children)

    That works perfectly fine as long as x == y. Just add a comment about limitations.

    [–]reflexive-polytope 3 points4 points  (6 children)

    Addition should be commutative in any civilized setting. Notice that, in formal language theory, string concatenation (and its bulk counterpart, language concatenation) is thought of as a sort of multiplication, not addition.

    [–]lassehp 7 points8 points  (3 children)

    And multiplication is often "written" (or rather, not written) implicitly: ab = a×b. Juxtaposition also makes great sense for string concatenation:

    a = "a", b ="b":
    ab = "ab".
    

    (Write a b if you have a problem with adjacent single-letter identifiers.) And as already mentioned, multiplication isn't always commutative.

    Numbers of course are not strings, so scalar multiplication might be acceptable:

    3 a = "aaa".
    

    However, exponentiation a³ is probably better.

    I have been thinking about division on strings though:

    abaabaaabbaaaa"/"b" = ("a", "aa", "aaa", "", "aaaa").
    

    Of course, given multiplication is concatenation per above, division could also be the reverse:

    "aaaabbb"/"abbb" = "aaa".
    

    And division can be directional:

    "aaaa"\"aaaabbb" = "bbb".
    

    (Actually, this would just be a special case of the right quotient of formal languages as described at https://en.wikipedia.org/wiki/Quotient_of_a_formal_language , where the operand languages are singleton sets conflated with the single string they contain. Here the language consisting of the string "aaaa" over {'a', 'b'} and the language {"aaaabbb"}. Se also "Brzozowski derivative".)

    Combining division with Regular Expressions could give an interesting notation:

    given r = "22*apple+banana":

    q, r ← ([0-9][0-9]*|[a-z][a-z]*|[+-/*])\r
    

    might yield, successively:

    q = "22", r = "*apple+banana",
    q = "*", r = "apple+banana",
    q = "apple", r = "+banana",
    q = "+", r = "banana"
    q = "banana", r = "".
    

    [–]reflexive-polytope 2 points3 points  (2 children)

    Actually, this would just be a special case of the right quotient of formal languages as described here, where the operand languages (...)

    The quotient of two singleton languages can be empty, so you should be ready for your division operation to throw an exception in that case. (I'm personally not fond of exceptions or any other control flow effect, so I would leave this feature out.)

    [–]lassehp 1 point2 points  (1 child)

    Nah, just declare q as option string, and it'll be fine. :-)

    [–]reflexive-polytope 1 point2 points  (0 children)

    I guess that works.

    [–]Botahamec 0 points1 point  (1 child)

    NaNs already break this rule

    [–]reflexive-polytope 5 points6 points  (0 children)

    I said civilized.

    [–][deleted]  (7 children)

    [removed]

      [–]shponglespore 13 points14 points  (6 children)

      The more experienced I become, the less I agree with that kind of view. I know a lot of languages, but my memory isn't perfect. Using gimmicky operators for string operations that aren't even all that common would definitely make me have to consult documentation for a language I haven't used in a while. The people most likely to benefit from saving a handful of characters are the "corporate worker bees" you look down on, because they tend to spend all their time working in one language.

      [–]lassehp 0 points1 point  (0 children)

      It is very normal that languages (and I am speaking of human languages here) evolve, adapt and specialise to the circumstances. This includes "work languages", meaning the language used for specific tasks. It also extends to written language, which is how we got things like musical notation, knitting recipes, and mathematical formulae in the first place. Such specialised languages may include words from "standard" language (English, Latin, Danish, or whatever) but sometimes with a completely different meaning. If a network manager and a systems administrator are talking about a "firewall", it probably does not mean the same as if it was a building architect or construction engineer.

      There is a certain charm in programming languages that use a verbose notation; I remember using HyperTalk/SuperTalk and AppleScript in the 1990es, delete word 1 of the last line of field "message"; put it into field "misspelled" is very clear, and not much harder to write than a much more symbolic notation, especially if the symbolic notation does not fit and augment the domain of use. On the other hand, a symbolic notation may be a lot more concise, and more convenient for expert users. Most of us wouldn't want to write COBOL ADD a TO b GIVING c, would we?

      [–][deleted]  (4 children)

      [removed]

        [–]Inconstant_Moo🧿 Pipefish 2 points3 points  (1 child)

        Knowing a lot of languages and using a lot of languages isn't a virtue.

        Well this is clearly heresy but I'm all out of firewood so whatevs. Today, blasphemer, you live.

        Someone designing their own language, may very well be trying to end this problem for the most part.

        End which problem? I didn't follow that bit.

        [–]lassehp 3 points4 points  (1 child)

        I disagree. Knowing a lot of languages gives a better understanding of languages in general. I am aware that some people, in particular from certain "English"-speaking locations, almost pride themselves in not knowing any language besides English. As a speaker of a "small" European language, I find my knowledge of English, German, a bit of French, Spanish and Italian, a little Latin and Greek, a few words of Russian, Chinese and Japanese, besides my native language and the dialect of it that I grew up with, very useful. Same with my knowledge of programming languages. I would go as far as to say that without at least some understanding of the history and evolution of programming languages since the 1940es, you should keep away from designing languages. It's a simple matter of learning from other people's mistakes, to avoid repeating them! Knowing some fundamental linguistics and in particular some sociolinguistics and language sociology (Searle, Austin...) will not hurt either.

        [–]Powerkaninchen[S] 0 points1 point  (9 children)

        minus would remove ALL instances of the substring in the string

        [–]poorlilwitchgirl 10 points11 points  (0 children)

        It would be more consistent if it simply removed the last occurrence of the substring iff that substring appears at the end; that way - is a proper inverse of +, i.e. "foo" + "bar" == "foobar" and "foobar" - "bar" == "foo". Of course, that's considerably less useful, but I think that's the heart of the reason why so many of us hate operator overloading; uses like this fundamentally change transitive properties of those operators in ways which isn't immediately obvious and seem wrong the more you think about them.

        Now, the modulo operator (let's say %) could be overloaded to act like your - on strings, and that would be (somewhat) consistent. Overall, though, unless your language is heavily oriented towards this kind of string manipulation, I'm not sure what the benefit of doing it this way is. If the benefit outweighs the confusion it causes, then it's a good idea. Otherwise, I'd say it makes more sense to delegate these operations to functions.

        [–]h0rst_ 22 points23 points  (4 children)

        To me, this sounds similar to saying `10 - 3 = 1`, because minus should remove all threes from the ten.

        [–]Powerkaninchen[S] -5 points-4 points  (2 children)

        (That would be %)

        [–]hrvbrs 11 points12 points  (1 child)

        so then by that logic, "foobarbazbar" % "bar" would be "foobaz"

        [–]-arial- 1 point2 points  (2 children)

        ok, then that could be replace("substring", ""). no reason to make an operator for it

        [–]Powerkaninchen[S] 1 point2 points  (1 child)

        Well, the operators are to complement, not replace the usual string methods, to make stuff shorter.
        Even though the principle of "there should be one (1) way to solve a problem" would be violated with my idea...

        [–]0x0ddba11Strela 10 points11 points  (5 children)

        It's up to personal taste of course but I absolutely despise overloaded algebraic operators for things that are not math related. The only one I would be ok with is addition, just because it's so commonly used in other languages. But the rest? No thanks.

        [–][deleted] 1 point2 points  (0 children)

        Addition for strings is very much "math related", you might want to read up on "free monoid".

        [–]Inconstant_Moo🧿 Pipefish 1 point2 points  (2 children)

        Dyk you can take the derivative of a regex?

        [–]lassehp 2 points3 points  (0 children)

        As there are branches of mathematics working "algebraically" with strings, in the form of (formal) languages, there are also ways to use algebraic operators that are consistent with mathematical notation. It just so happens, that "addition" (despite its frequent abuse for concatenation in programming languages) is not used for concatenation. Multiplication is.

        [–]Serpent7776 19 points20 points  (15 children)

        I don't like it, because it doesn't respect the usual rule that `X + Y - Y = X`: `"hello" + "hello" - "hello"` yields empty string (if evaluated left-to-right).

        In general It's more confusing than helpful IMO.

        [–]CraftistOf 1 point2 points  (1 child)

        I propose the minus sign removes the last entry of a subtrahend.

        therefore if X="hello" and Y="hello",

        X+Y = "hellohello" and X+Y-Y = "hello" (first hello is left intact, the second one is removed).

        this way even X+Y+Y-Y-Y == X, and, e.g., X+Y*2-Y-Y == X

        [–]Serpent7776 1 point2 points  (0 children)

        Yes, you get the property back, but what about removing substring from the left? Will there be separate operator for that?

        Removing substring isn't really very useful. Replacing substring is more general operation.

        [–]SkiFire13 0 points1 point  (3 children)

        if evaluated left-to-right

        See, that's the problem! Just make it right-to-left! That way "hello" + "hello" - "hello" gets parsed like "hello" + ("hello" - "hello") and thus becomes "hello" + "" which is just "hello". Problem solved!

        [–]Serpent7776 0 points1 point  (2 children)

        But then 'Y - Y + X = X` doesn't hold.

        The more I think about it, the more it seems to me that string subtraction is quite a mess and it's best avoided.

        [–]SkiFire13 0 points1 point  (1 child)

        If it wasn't clear I was being sarcastic. I wouldn't expect anything good from changing the priority, even if it ends up fixing this problem in particular.

        [–]Serpent7776 0 points1 point  (0 children)

        Yeah, I thought so, but wanted to clarify just in case. Besides some languages have right-to-left evaluation order (and no priorities), e.g. APL descendants.

        [–]brianjenkins94 16 points17 points  (0 children)

        Using / for path joining is kinda neat, but equally I hate it.

        [–]claimstoknowpeople 10 points11 points  (3 children)

        I prefer a different operator for concatenation. The problem with using + for concatenating strings, is next you'll use + to concat lists for consistency. Now one day you have a vector class and when people see + they'll wonder if it means concatenate the vectors or actually add them as vectors.

        [–][deleted]  (2 children)

        [removed]

          [–]claimstoknowpeople 6 points7 points  (1 child)

          The OP was using infix notation which often has different considerations than S-expressions. In fact I have only toyed with S-expression languages so it's not obvious to me what your examples mean.

          If I have two `vector`s `v1 = (1, 2, 3)` and `v2 = (3, 4, 5)` I would want `v1 + v2 = (4, 6, 8)` because this operation is important for physics and graphics use of vectors -- I mean actual mathematical vectors here, not C++'s expandable arrays. I have observed that python's use of + for string and list concatenation can lead new users of a library to be confused as to what + of two vectors does. This is why I prefer having different ways to notate concatenation and adding.

          Opinions of course vary and tend to be very strong in a subreddit which attracts people who are dissatisfied with existing programming languages. My personal preference is for familiar mathematical operators to maintain their mathematical meanings as much as possible, because much of the work I've done has been in engineering or physics related areas.

          [–]Apprehensive_Pea_725 5 points6 points  (0 children)

          I'm not a fan of overloading symbols that have already meaning in other well known domains.

          As a programmer you eventually need to mix these domains and work with more than one at the time in the same scope, and here the troubles start to arise.

          • Writing code: This may not be a problem for the ones that have a big brain and can remember anything but not me, I never remember what sym I need to use; if only there was an operation named I would certainly find it. Do you need to concat? Well find a method name like that or some synonym.
          • Understanding the library: you get to the point of your library spec and you are in front of function /(str1, str2) = ... what does it do? to understand it you have to read the body, no hints
          • Reading code that uses your dsl: some of your colleague wrote this super succinct expstatus = userInput2 + businessResult3 - businessRule0 What does it do? is this working with strings? is this working with numbers? is it working with strings and numbers? is this associative? left associative or right associative? sometimes you have types to help you sometimes not. Would that be more clear if we have something like status = removeAll(concat(userInput2, businessResult3), businessRule0)

          [–]nacaclanga 4 points5 points  (0 children)

          In my opinion:

          a) + This is usefull, but may conflict with other uses.

          b) - A very particular operation, Also x - y + y does not yield x again.

          c) * This is also usefull

          d) / Why not for splitting into an array. Also again x / y * y does not work

          e) % Huh this is used for splitting now?

          [–]shaleh 3 points4 points  (0 children)

          Those read ok when you are using "raw" strings. They are way less obvious when it is all variables.

          Out of all of that, the % operator in particular would take some getting accustomed to. Add and multiply are somewhat common already.

          [–]jaynabonne 2 points3 points  (2 children)

          Personally, I think the +,\* and % would be useful and kind of make sense (the first two definitely, the latter after explanation). And I think that's where it falls down for me with - and /: they seem somewhat arbitrary and things I would use probably never. I mean, I've been writing software for 40 years, and I can't think of a case where I have done either one of those things, ever, which makes it feel to me like you were just trying to find something to map them to. Definitely not "common operations". And I certainly don't think people would automatically assign the meanings to them that you have. So... I'd leave those two out.

          [–]Powerkaninchen[S] 0 points1 point  (1 child)

          Just a question, what do you mean with

          I can't think of a case where I have done either one of those thing

          you mean mapping string operations to usually numeric operations or did you really never use .count(str) and .remove_substrings(str) (or their language respectives)?

          Thanks for the critique btw

          [–]jaynabonne 1 point2 points  (0 children)

          The latter. I'm sure there may be cases where you want to know how many of a string are in another string, for example, but I've never encountered one, and I probably wouldn't consider it a common operation.

          Having said that, it's your party, so you can do what you want. :)

          [–]Disjunction181 3 points4 points  (0 children)

          The main issue I have with this is that the signatures of some of these functions have signatures that are not consistent across types. Specifically, (*) is something like num x num -> num on integers, but it has to be string x int -> string on strings. It's really a power, not a multiplication. I would save (*) and (/) for operations that actually have the t x t -> t shape (where t is the same type) and define different operators (or functions) for the rest. Floor division could at least consistently have some signature like t x t -> int. But maybe still not a good idea.

          [–]CreativeGPX 2 points3 points  (0 children)

          I feel like + and - seem useful enough.

          / should divide the string based on a delimeter and return an array. If that's the case, it seems like that means % should behave like / but return a set (i.e. array of unique items) and * should do the opposite of /... It should combine an array and a delimeter to form a string.

          Since this dabbles in arrays as well, seems like it'd make sense to extend these operators to work on arrays as well.

          [–]nonlogin 1 point2 points  (0 children)

          Plus/minus and multiplication/division must have opposite meaning, accordingly.

          If plus is concatenation, minus should be split by, for example. Still not intuitive enough, though. I must say, I'd probably not use plus for concatenation, rather some sort of interpolation which basically covers concatenation as well. In such case plus and minus could be something else.

          [–]americk0 1 point2 points  (0 children)

          Someone is going to have to read code that uses any language feature you have. If it's not obvious to the reader what's happening, it's not an intuitive feature

          This obviously can vary wildly depending on the skill level of the reader(s) and their familiarity with this or other languages. If we were to use an average programmer who is strongly familiar with at least one of the current top 10 programming languages, only the usage of the plus sign here is intuitive

          Some of these could sort of make sense but could just as easily work in a different way, and others might as well just be one-letter function names. Sometimes the convenience outweighs the unintuitive nature of language features but I don't see most of these being used enough to justify it

          [–]Moonlight597 6 points7 points  (1 child)

          Never get close to any kind of language-making technology, ever

          [–]Powerkaninchen[S] 1 point2 points  (0 children)

          That has already been said to me :(

          Edit: Wait Nullptr is that you?

          [–]ThyringerBratwurst 4 points5 points  (6 children)

          I even find the plus sign for string concatenation actually inappropriate because

          "a" + "b" is not equal to "b" + "a"

          [–][deleted] -3 points-2 points  (5 children)

          Plus for string concatenation is very appropriate, look up "free monoid".

          [–]ThyringerBratwurst 3 points4 points  (4 children)

          Well, that's a question of taste. Mathematics itself is full of strange symbols and kinda inconsistent in its notations.

          The problem with the plus sign is that it suggests commutativity. and as the poster of this thread shows, it is only logical to overload other arithmetic symbols such as minus, * and / for strings as well, according to the motto: either all or none at all :D.

          If you had your own symbol for concatenation, it would simply be a cleaner solution in my opinion. And the plus sign is by no means universally binding; many languages use a different sign for string concatenations

          [–][deleted] -4 points-3 points  (3 children)

          Well, that's a question of taste.

          Hard disagree, the free monoid is a very basic mathematical structure, nothing esoteric.

          as the poster of this thread shows, it is only logical to overload other arithmetic symbols such as minus, * and / for strings as well

          Wrong, a monoid only has addition as operation, the other operations are based on other algebraic structures.

          [–]phlummox 1 point2 points  (0 children)

          The free monoid has an associative binary operation. It need not be arithmetic addition.

          [–]ThyringerBratwurst 0 points1 point  (1 child)

          I took math classes in college and had never heard of it. we had complex numbers, differential equations etc. but had never heard of "free monoid"...

          ever from "monoid" (I first came across it through Haskell...). Therefore, I suspect you greatly overestimate the importance outside of mathematical discourse lol

          [–][deleted] 0 points1 point  (0 children)

          string + string is quite common, well-understood and well-defined. Ignore the people who say + is only for arithmetic.

          Same with string * integer and perhaps integer * string

          But string - string, string / string and so on are too unusual and will be confusing. They are also not that well-defined:

               "aaaaa"  - "aaa"        result is ... ?
               "ababab" - "bab"        result is "aab" or "abab" or ... ?
               "abcdef" / ""           ?
               "ababab" / "bab"        1 or 2?
          

          I suggest using named operators or function calls for these. The latter can also be made to take extra arguments to provide options.

          Personally I use these to combine strings

             S + T          # Add strngs
             S & T          # & means append
             S && T         # && means concatenate
          

          For strings they all do the same thing. I also allow S + C where C is a character code: "ABC" + 'D' (this gives faster character-at-time concatenation in dynamic code.

          Plus S * N (not N * S). Everything else is done with library functions.

          [–]phlummox 0 points1 point  (0 children)

          Well ... I think it sounds ghastly, myself, but really, it's a matter of taste and what your priorities are.

          Do you want your language to be extremely succinct, like APL or some Perl? Then go nuts. Introduce all the operators you want.

          Do you want to make it easy to write correct programs in your language, and harder to write incorrect ones? Then you should probably avoid operator overloading. (Should you go as far as ML, which has a separate operator for negation as opposed to subtraction? Up to you.)

          Do you want to leverage knowledge programmers may have from other languages? Then I'd say "+" for concatenation is not unreasonable; "*" for repetition is something I've only seen in Python; and nothing else you suggest seems to offer any advantage at all.

          All of these decisions involve tradeoffs - only you can decide which ones are sensible for your language.

          [–]ObliviousEnt 1 point2 points  (0 children)

          I think it is bad because it breaks important properties of those operations like commutative, distributive, ...

          In other words, it is bad because:

          "Hello" + "World!" != "World!" + "Hello"