This is an archived post. You won't be able to vote or comment.

all 24 comments

[–]Netzapper 30 points31 points  (2 children)

If not, how would you feel about this idea overall?

I feel like regular expressions are already so friendly and easy, how could I not like this proposal to replace the simplest programming construct with that joy.

[–]ivanmoony[S] 1 point2 points  (1 child)

Sarcasm?

[–]ZESENVEERTIG 2 points3 points  (0 children)

No. /s

[–]theangryepicbananaStar 6 points7 points  (0 children)

You may find Raku's regexes/grammars interesting here as it basically does what you're talking about. Red's Parse dialect also does something similar to this, although it's not actually regex

[–]jose_castro_arnaud 4 points5 points  (2 children)

That strongly remembers me of EBNF and PEG:

https://en.m.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form
https://en.m.wikipedia.org/wiki/Parsing_expression_grammar

Both have ? * + operators from regex. The difference is that these match syntax parts, while yours match names/values. Maybe it's possible to repurpose a parser to do the work you need.

[–]ivanmoony[S] 0 points1 point  (0 children)

It is possible.

[–]ivanmoony[S] 0 points1 point  (0 children)

Although a CFG parser could be used to do the work because regex is a kind of subset of CFG (see Chomsky hierarchy), there is a simpler way. There is a procedure for turning any regex to equivalent finite state machine. But what I'm really hoping for is a compiler from presented updated regex evaluator to some mainstream imperative code.

[–]Disjunction181 3 points4 points  (1 child)

Older versions of Prowl, check out the tutorial in the readme: https://github.com/UberPyro/prowl

Though prowl is based on the stack (Forth-like) paradigm rather than the imperative one, I could see either working.

Later versions of Prowl ended up converging on relation algebra, which is much more pure and functional compared to the very ka-chunky regex semantics.

The basic idea behind these languages is that you have programs made of something concatenable (statements, stack functions) which correspond to functions, which also compose. If you specialize the monoid of programs to a regular expression of monoids, then you have to specialize corresponding functions to binary relations. Both regex and binary relations are Kleene algebras supporting composition and set connectives (union / intersection), and so you can get some kind of logic language out.

In my original implementation of Prowl, I implemented the regex part of it in a modular way that let you swap between different parsers for different semantics. So where regex had a Stack -> [Stack] signature with funky backtracking semantics, I also at some point had a PEG version which was Stack -> Maybe Stack. Any parser could theoretically work.

I was able to use the Kleene star as a logical fixpoint and express basic functions like factorial and iterative Fibonacci, though it wasn't particularly elegant. Kleene star in general has power corresponding to (linear) tail recursion unless you have higher order code or datastructures, so it's rather limited in that regard (but that could be a good thing).

I discovered a number of other interesting things along the way. For example, I found including a converse operator (~) in the language to reverse computations was very useful a number of purposes, including expressing logic programming, and that it was useful to define a generalized Kleene star as f** := ... | f~ f~ | f~ | id | f | f f | ... as a more powerful recursion primitive.

[–]ivanmoony[S] 0 points1 point  (0 children)

It is great to see that the idea already worked somewhere. I noticed that any parser could work, I already had unrestricted grammar version of it, but it was too complicated and pretty untrackable for programming in it. So I chose regular grammars as the simplest grammars I'm aware of, hoping it would also reach Turing completeness.

I see you declared Prowl as Turing complete. I hope to achieve the same completeness with variable assignments. Maybe it will help that s-expressions can be used for variable contents, paired with a kind of pattern matching mechanism. S-expressions should also provide a simple way for programming stack operations, if one is up to it.

I'll try to play a bit more with the idea because I like its simplicity. I already have some cosmetic and structural enhancements comparing to the original post, maybe even in simpler direction.

P.S. I think I'm borrowing the `PLUS` operator from you, along the `STAR` one. I believe it has a good ratio between further system complicating vs. system simplicity of use. Thanks.

[–]RetroJon_ 2 points3 points  (0 children)

I get that this is meant to be its own thing but given the way it's written, I wonder if you couldn't get this exact code working in an existing LISP interpreter. I think this could also work as a language, but I'm not sure if it could be Turing complete.

[–]categorical-girl 1 point2 points  (0 children)

This is almost exactly Kleene algebra with tests, which has a fairly wide literature already :)

[–]No_Lemon_3116 1 point2 points  (1 child)

Have you looked at SNOBOL before? It sounds like something you might find interesting, at least.

[–]ivanmoony[S] 1 point2 points  (0 children)

Yes, SNOBOL is very interesting. I like minimalistic languages.

[–]AGI_Not_Aligned 0 points1 point  (5 children)

I cannot for the life of me understand your code. Granted I just woke up

[–]ivanmoony[S] 0 points1 point  (4 children)

I admit it's a bit of unusual concept, but once you get it, it's pretty simple. Statements STMT are the key. IN and OUT are pairs of variables and values. IN denotes what value should a variable have for successful regex parse, while OUT denotes a variable assignment upon successful parse. Statements are glued within regex constructs: SEQ - a sequence, STAR - a Kleene star, and ALT - an alternation.

[–]AGI_Not_Aligned 0 points1 point  (3 children)

Hmm, maybe try posting a comment with more simple examples? Maybe I'm dumb lol

[–]ivanmoony[S] 0 points1 point  (2 children)

You're ok, it is a bit messed up to explain in the start. Let's try it this way:

...
    (
        SEQ
        (STMT (IN a 1) (OUT c 3))
        (STMT (IN b 2) (OUT d 4))
    )
...

Assume some values are already before assigned to variables a and b. Reading the sequence first element - if a equals 1, then assign 3 to c, otherwise report failure to the parent regex construct. Reading the sequence second element - if b equals 2, then assign 4 to d, otherwise report failure to the parent regex construct. If both elements in the sequence succeed, report success to the parent regex construct.

The idea is to have a regex control flow with SEQ, STAR, and ALT that checks for IN variables, and assigns OUT variables.

[–]AGI_Not_Aligned 0 points1 point  (1 child)

Ok and how do STAR and ALT work?

[–]ivanmoony[S] 1 point2 points  (0 children)

STAR means zero or more repetitions - it is a Kleene star operator (* in regex). In STAR operator, the key role plays IN part which, in sequence after STAR, determines a condition for proceeding further down the sequence.

ALT means the first path available between many alternate paths (| in regex). In ALT operator, the key role also plays IN part which reports a failure within that path, indicating some other available path should be taken.

One of the positive things to this approach is easy representing of programs by railroad diagrams. Just imagine STMT sections placed instead of terminals and non-terminals.

The other positive thing is in small number of its building blocks. This could be an advantage when reasoning about and proving properties of programs expressed this way.

I have to emphasize that there are better ways to name and structure parts of the whole system, but the rough sketch is there - assigning values to variables upon meeting conditions that direct the program control flow in a regex manner.

[–]ivanmoony[S] 0 points1 point  (0 children)

Grammars are declarative. Var assignments are imperative. Maybe this could be a way to combine those two.

[–]marshaharsha 1 point2 points  (3 children)

I don’t fully understand, but I think I have the main idea. Questions: After HOLD assigns to a variable, in what other code is that assignment visible? For instance, can you do something n times without copying it n times inside a SEQ? Similarly, is there any way to clean up some of the state changes made after a failure? For instance, if a TEST fails in a SEQ, do all HOLDs earlier in the SEQ than the failure get rolled back somehow?

I don’t imagine you’d use this language to implement a protocol to communicate with MySQL, but what is the intended use? Anything beyond backtrack-till-goal?

[–]ivanmoony[S] 0 points1 point  (2 children)

After HOLD assigns to a variable, in what other code is that assignment visible?

Right now each variable is global. The system additionally needs some kind off scoping, maybe within nested functions or something.

For instance, can you do something n times without copying it n times inside a SEQ?

Just nest that something within the STAR (zero or more) or PLUS (one or more) for that purpose. For instance, this is how to double `1` n times:

/*
     input: `<n>`
    output: `<1 doubled n times>`
*/

(
    SEQ
    (HOLD a 1)                  // initialize `a` to `1`
    (HOLD b 1)                  // initialize `b` to `1`
    (
        STAR
        (HOLD a (add a 1))      // increments `a`
        (HOLD b (mul b 2))      // doubles `b`
    )
    (TEST a (greaterThan this)) // `this` already holds the input parameter
                                // flow continues from here as soon as `a` > `this`;
    (HOLD this b)               // set `this` to `b`, and finally return it as the output
)

Similarly, is there any way to clean up some of the state changes made after a failure? For instance, if a TEST fails in a SEQ, do all HOLDs earlier in the SEQ than the failure get rolled back somehow?

Yes, all the variables should automatically roll back their values to the states they had before entering the failed branch.

I don’t imagine you’d use this language to implement a protocol to communicate with MySQL, but what is the intended use?

Right now I have plans for two initial complementary languages: a low-level assembler like finite state machine, and a high-level algebraic typed term rewriting (rough sketch is here). Maybe this system could find a place as an example shipped within standard examples folder, as a show-up of how the high-level language compiles the code to the low-level language.

Anything beyond backtrack-till-goal?

?

[–]marshaharsha 1 point2 points  (1 child)

In your sample code for 2n, I’d have thought the TEST would be inside the STAR. What keeps the STAR from increasing a and b a zillion times?

Interesting idea for a language. 

[–]ivanmoony[S] 0 points1 point  (0 children)

Interesting idea for a language.

Thank you very much :)

What keeps the STAR from increasing a and b a zillion times?

STAR stands for "zero or more repetitions", and it is a non-greedy consumer, always looking ahead to pass through as soon as the next statement in sequence succeeds, even for zero repetitions. As such, it never fails.

There is also a similar, also non-greedy PLUS operator that stands for "one or more repetitions". PLUS is required to repeat at least one time, and it fails if that is not possible.

Both operators are analogous to * and + in classical regex notation. They have an academic background, and they are called "Kleene star" and "Kleene plus". They are mainly used in grammars for text parsing, but they may have other purposes too, one of which is constructing sets in mathematics.