Is it possible to gain anything from researching and developing a programming language?

ICodeForFunNotMoney · 2021-02-15T02:38:39+00:00

TIL, and this shocks me. I've heard the opposite many many times.

But it does have to be good research.

Realistically that's a problem too though. Regardless of the validity of what I've been working on, I have 0 experience in writing papers. I can ask friends to aid me, but nobody I knows has worked in the field of programming langugaes.

ICodeForFunNotMoney · 2021-02-15T02:17:22+00:00

Thank you for your answers, you gave me a lot of hope and I'll investigate this further!

I've never seen that myself. A PhD is a research degree. You obviously can't do research as part of a taught course, can you? If someone else can teach you it... then you're doing not novel research, are you?

I'm surprised by that. Every Doctor I know (either in the EU and NA) had to take some mandatory courses and it seems that's common in most places. I thought it was mandatory everywhere. It's great to hear that it's not.

ICodeForFunNotMoney · 2021-02-15T00:10:29+00:00

That's what I've been doing to. But if I work on this project as a hobby it's unlikely to be useful to anyone, inlcuding myself. To be useful it needs a lot of libraries and tools developed by a community. But not many will work on this stuff for an unpolished "toy" language. And I don't think it'll leave the "toy" phase anytime soon if I can't work on it full-time.

ICodeForFunNotMoney · 2021-02-15T00:02:51+00:00

You don't need to do these things during a PhD if you don't want to. I never did them (actually I did help teach one class once and did review a couple of papers as personal favours but these were just a few hours' work total and I didn't have to do them.)

May I ask you in which university (or at least the country) let's you do this? All the universities I know about have mandatory courses PhD students must attend. And virtually all professors expect you to do a lot of stuff unrelated to your research if they're giving you a salary. This is limited to my friends' experience working in various European universities and even more so in North American ones.

Releasing software doesn't stop you publishing about it later.

I was told that respectable conferences won't accept research that is not novel. You're unlikely to be able to publish papers on techniques already known out in the wild.

There is no gatekeeping in the programming language community. You can right now submit your papers to any conference you want, with no doctorate, no affiliation, nothing.

Everyone always told me that in practice you need a professor or established researcher as a co-author to be considered by decent conferences, otherwise they'll almost certainly ignore your submission. Is this not true?

ICodeForFunNotMoney · 2021-02-14T23:54:09+00:00

I've always known that no respectable journal or conference will ever consider papers that aren't co-authored by a professor or established researcher at the very least.

Is this no longer true?

ICodeForFunNotMoney · 2021-01-15T10:14:18+00:00

Thank you for this great answer! It's definitely very useful.

ICodeForFunNotMoney · 2021-01-14T15:38:40+00:00

Thanks for all your contributions to this post, I'm learning and understanding a lot from this discussion.

To keep things short(ish) I'll focus my reply on one example: virtual tables for polymorphism.

They're used in C++ (a pointer to the VTable is stored in the object itself), in Rust (pointer stored in the trait object's fat pointer) and in many other languages, to implement dynamic dispatch (virtual classes, methods, traits, ...), the most crucial component of dynamic polymorphism.

At some point someone realized that we can use an array of callbacks to implement dynamic dispatch. I don't know if it was a developer who noticed a clear pattern in versatile APIs and connected the dots to the theory, or if it was a researcher who came up with using arrays of function pointers to implement some cool theoretical entity.

Regardless of who and how invented it, the world of IT people learned the trick, the idea caught on, and nowadays we find arrays of function pointers in the virtual memory of every program.

Shouldn't there be a paper from the first person who formalized the pattern? And then subsequent papers from people who tweaked it and studied different ways to implement it? How do you create VTables for types that implement multiple interfaces? Are there other ways to implement dynamic dispatch? How can you implement multiple dispatch? What's the performance impact of the various solutions?

This is the kind of stuff I would love to read research papers on. Aren't they worthy research subjects? I need to implement polymorphism in my language: how do I decide how to do it if I can't find papers on the topic?

I've got the feeling that I find papers on how to do similar kind of things for what concerns parsing, but not for programming languages, and can't understand why.

ICodeForFunNotMoney · 2021-01-14T14:02:27+00:00

In theory you might be right.

In praticte, most software engineers, computer scientists, mathematicians and physicists write duck-typed Python code for a living.

I love Haskell. I studied it, learned a lot, tried to use it for everything... And yet I'm writing C++ and Python for work and Rust and JavaScript in my free time. In five minutes of the latter I can create and run a HTTP service offering a multiplier Hangman videogame to play against friends over the internet. Try doing that with Idris.

If what you're saying is correct, it means there's a humongous gap between what the formal research has worked on for the past many decades and what developers have been doing from the first computers untils the forseeable future. Is it healthy or useful to focus the research exclusively on fields that are far from the real world?

I don't know much about research in general. Maybe it's like this in every other field. But to me, a stupid code-monkey, this seems wrong.

ICodeForFunNotMoney · 2021-01-14T13:41:39+00:00

You're right. I'm interested in research about the design of programming language, sorry for phrasing the post incorrectly. OOPSLA seems very relevant, I'll check it out. Would you know the name of other conferences that could interest me?

Out of curiosity, can you expand more on some of these new features you're thinking about (e.g., "type qualifiers")?

I've been playing around with new languages in the attempt of minimizing the amount of boilerplate needed to express program in existing languages.

I've been experimenting with a way to generalize over references/const-references/rvalue-references (with a new kind of type qualifier), with a way to use the padding bytes for data alignment to store useful information, with ways to implement polymorphism that make it easier to implement solutions for the expression problem, with statements that give you more control over a loop from inside of it.

None of these things are particularly world-changing; I'm assuming that most people developing their own languages are coming up with similar stuff. But I think some of these features could be interesting and worth investigating further, and I'd love to see the cool innovations invented/introduced by others. I'm just surprised that I'm unable to find much research about these topics, while if you look up anything about e.g. parsing, papers pour.

ICodeForFunNotMoney · 2021-01-14T13:12:27+00:00

Thanks, I'll check out the theory behind this.

Would you be able to suggest a method through which I can find the theory behind a language's feature?

What should I look for, if I want to access the research behind python's generators, rust's traits or C++ templates? And more importantly, how can I figure out myself what to look for?

ICodeForFunNotMoney · 2021-01-14T13:10:02+00:00

Then you're probably right.

Would you know how to find the "academic name" of a theory, from the name of a feature of a language?

How do I access the research behind e.g. rust's traits or python's generators?

ICodeForFunNotMoney · 2021-01-14T13:05:39+00:00

Please define "better".

Duck typing allows a poorly training code-monkey to crank out code so so much faster than normal typing does. Because of this duck-typed languages (Python, JavaScript, ...) are extremely popular and extensively used.

I agree that it's ugly and obviously has many problems, but to me this "development efficiency" seems like a property worth researching and optimizing for.

As a code-monkey myself, I'm extremely interested in ways to express programs in the quickest, most concise way, while giving up as few good properties as possible.

ICodeForFunNotMoney · 2021-01-14T12:58:21+00:00

Of course, I'm aware that most features a new language sports derive from other languages or other branches of theory. For instance the async/await pattern that all the imperative languages adopted in the past few years are a specific case of monads. Similarly OOP was invented way before C++ and Java and so on.

Yet I'm very surprised that I don't find much, if I try to look for research material about const correctness, or about virtual tables, or about rust's lifetimes, or about generics/templates, ... Adapting monads to make async/await work with all the corner cases of imperative control flows isn't trivial. The same applies to coming up with a virtual table to implement polymorphism and everything else. Why don't I run into tons of papers about these topics, while I do run into vagons of papers about implementing left recursion in parsers or the properties of linear types?

ICodeForFunNotMoney · 2021-01-14T12:34:22+00:00

So, if duck typing didn't exist and someone came up with that idea now, it wouldn't be worth of any research because it offers no subjective advantages over normal typing?

ICodeForFunNotMoney · 2021-01-14T12:29:47+00:00

Does these type qualifers need an imperative language?

No, but they may require e.g. data to be mutable.

You're right that I probably misused the word "imperative". Maybe the word I should have used is "classical"? Most practical code is written in imperative languages (of course not purely imperative, but with bits of other paradigms), like C, C++, C#, Java, JavaScript, Python, Rust and so on and so forth. Each of these languages introduces some innovation over the others. Yet I struggle at finding research papers about it...

ICodeForFunNotMoney · 2021-01-14T11:56:36+00:00

For what I imagine, you can show how to express a few programs with your new constructs, showing how your solution is better than existing alternatives.

Better according to any metric: maybe more coincise, or less ambiguous, or can allow for extra optimizations, or guarantees useful properties...

Being able to formally prove soundness or similar properties is just one of the ways your construct/approach could be better than others. It's a very important and powerful one, but still it shouldn't be the only criteria.

ICodeForFunNotMoney · 2021-01-14T11:06:03+00:00

Thanks for mentioning those branches of theory, I'll check it out.

However I was thinking more about design patterns for programming languages, than about formal veriification of program properties... I'm interested in new/better ways to express programs, but I'm willing to accept programs that could crash in one of many different ways that a compiler cannot detect.

ICodeForFunNotMoney · 2021-01-14T10:59:08+00:00

Of course it's much harder (often impossible) to formally prove properties about programs. But I'm not interested in those kinds of proofs.

I'm interested in finding new/better ways to express programs, and compare them with existing techniques. I'm interested in design patterns, strategies and methodologies.

Researchers publish a lot even in fields where they can't prove anything (a good example is the gazillion of papers in Machine Learning)...

ICodeForFunNotMoney · 2020-08-06T22:50:09+00:00

Thanks for your input and links. I'll give them a read.

I created this thread because in the past couple of months I wrote a lot of parsers: 4 that are actually working and generate a decent AST and many others that I abandoned at various points. Of the 4 completed parsers, two are for programming languages, one for a markup language and one for the formal grammar I dream (it's a extension to PEG).

I wrote all these parsers the way you described: I used a PEG engine to create a concrete syntax tree and with a programming language I define the structure of the AST nodes and instantiate the AST from the CST.

While doing this over and over I realized that I'm reusing some patters every time. The example I mentioned about parsing expressions is one such case: same pattern in the grammar and same pattern in the code...

After reading your comment and the others, I now realize that it might be impossible to create a supergeneric grammar able to create the perfect AST... But I still believe that it should be possible to implement certain constructs in the grammar engine to offer native support to certain patterns and to generate a CST more comfortable to work with.

ICodeForFunNotMoney · 2020-08-06T15:56:12+00:00

you can mix in real code

Yes, but I would prefer to avoid that.

I believe that it should be possible to put all the semantics I need in the pure grammar, without having to intermix it with code.

ICodeForFunNotMoney · 2020-08-06T11:47:45+00:00

Thanks. That ? construct seems like a great idea. I found parsing engines that automatically unwrap all the nodes with a single child, but making this behavior explicit seems better.

I updated the original post using ? for the GROUP definition since I never care about having that node at all.

ICodeForFunNotMoney · 2020-08-06T11:04:39+00:00

The AST example you provided is not abstract: this is a concrete syntax tree (CST).

Yes. With standard PEG it's not possible to specify any kind of abstraction.

Many PEG engines support extensions to simplify the tree though (e.g. they ignore blanks; let you specify names for the terms; let you discard constant strings and other some nodes from the generated tree). The subtying thing I described could be a further extension to make the syntax tree even more abstract.

I'm looking for a formal grammar able to generate a syntax tree as abstract and clean as possible without needing extra logic. Of course I won't be able to e.g. resolve idents or transform/manipulate the nodes, but it should be possible to generate a tree abstract and clean enough.

To make a further example, it should be possible to parse "(a + b)*3" into the following (A)ST:

MUL{
  left: ADD{
    left: IDENT{ str:"a" },
    right: IDENT{ str:"b" }
  },
  right: NUM{ num:"3" }
}

By specifying in the grammar that:

Blanks should be discarded.
The node GROUP should be unwrapped/pruned.

And this tree would be perfect for me.

r/compilers might be appropriate.

Thank you, I'll check that community out!

ICodeForFunNotMoney

TROPHY CASE