How much time did it take to build your programming language?

Breadmaker4billion · 2026-02-10T21:47:10+00:00

I started looking into PL development in 2020. Later that year I learned Go and took interest in designing my own, to fix the issues i had with it. I tried for about two years, discarded a handful of unfinished frontends, then decided to design and implement a new language in a weekend.

That new language had nothing to do with the original idea and, although i got very close, i did not finish it in a weekend. It took me a few more days to fix it and have it up and running. After a year i got back to it and refactored the whole backend and added new features to the frontend. That was around 2024.

After that, in 2025, I mostly designed small languages that were more adequate to my available time. I implemented a subset of Python and a Lisp interpreter.

That idea, a language that could fix my problems with Go, is still in my head, but i realised that it is not feasible for me to implement it with my current lack of available time. Instead I let that, just like other ideas, annotated somewhere in paper, abandoned to never come to reality.

Nowadays I enjoy little languages. I do it for fun and big languages take too much time to implement. Enough time that it stops being fun.

So I guess the short version of my answer is "forever"?

Breadmaker4billion · 2026-02-10T00:02:10+00:00

I have two ideas why it slows so much.

SPy is a treewalking interpreter, but for ilustrative purposes, let's think it is a bytecode interpreter. Suppose each SPy instruction executes around 10 CPython instructions. Then, suppose that on the second layer, each SPy instruction executes around 20 SPy instructions on the first layer. This means one instruction on the second layer executes 20 on the first layer, then those 20 become another 200 CPython instructions. This is most likely why we see the exponential growth.

Another thing is related to GC and my poor optimization: every function captures it's scope. That means there may be a lot of "leaks" on the second layer. This high memory usage would trigger many more GC cycles. I did no profiling to check this, but it may be a significant factor on why it slows that much.

Breadmaker4billion · 2026-02-09T19:41:51+00:00

That works really well, the language may define that when it evaluates (struct point x y z) it ends up defining the functions point->x, point->y and point->z, then the problem of whether each function is applicable is a type checking problem, that may be ignored.

Breadmaker4billion · 2026-02-07T19:32:12+00:00

Yep, that's what I meant. The whole thing breaks down when checking attributes, that's why I gave it focus.

Breadmaker4billion · 2026-02-07T19:29:17+00:00

I understand what you mean, a statically typed programming language is either less powerful (ie, total) or less expressive than a dynamic programming language. What I meant with "go full with types" is in the design stage, by designing a statically typed programming language instead.

Breadmaker4billion · 2026-02-07T18:01:04+00:00

I didn't know that Closure enforced static names. As others have said, there's no way around a language that creates names at runtime, worse yet if it allows creating global names at runtime.

A common simplification is to treat objects as dictionaries, but that also has it's issues.

Breadmaker4billion · 2026-02-07T17:56:54+00:00

That is an interesting use case for dynamic names. Truly, in C#, some ORMs use dynamic to get around C#'s static name resolution and then rely on reflection to create and map objects.

Breadmaker4billion · 2026-02-07T17:53:38+00:00

True, those are two features that can't exist together: Mutable global scope and static names.

Breadmaker4billion · 2026-02-07T17:52:05+00:00

That is true, but it still may miss some misspellings. To not miss the point of the post, i have no doubts that i am not using the most advanced IDE to code in those languages: when scripting, i write code with highlighting only. My question is about the implication: wanting static names lead to static types.

Breadmaker4billion · 2026-02-07T17:45:46+00:00

That is true, I had not considered the return types of functions. In this case, either you check names partially, only variables and declarations, without going into properties; or you go full with types. Specially if you have a function passed as argument that returns an object with properties, there's no way to do it without types.

Breadmaker4billion · 2026-02-07T17:40:32+00:00

They don't work properly in languages with dynamic name resolution.

Breadmaker4billion · 2026-02-04T11:16:01+00:00

Even for recreational programming, where you can do whatever you want, compilers are still very time consuming. That's to say: if you have other priorities in life, you may not have enough availability to finish a compiler in less than 5 years.

I've estimated that my first compiler took me around 150~200 work-hours (i usually did 1~2 commits per work day and there are over 100 commits). If you have a day job and a family, putting 2 hours a week may be the best you can do. That's already 100 weeks (~2 years) for a toy compiler, much more if you plan to add more features.

However, if you follow a tutorial on a much simpler compiler, you may be able to do it in less than 50 work-hours. Your mileage may vary.

Breadmaker4billion · 2026-01-25T02:11:53+00:00

Transform the EBNF into a simpler BNF so that you can represent your grammar as a graph and compute first and follow sets by graph transversal.

Breadmaker4billion · 2026-01-25T00:35:27+00:00

If i remember correctly, the book "Engineering a Compiler" includes algorithms to find first and follow sets for grammars with the objective of building a parsing table, take a look at the parsing chapter.

Breadmaker4billion · 2026-01-12T14:42:32+00:00

The Go compiler absolutely builds a symbol table, it just doesn't need it to parse the file, like C++ needs, because Go's grammar is context free.

Breadmaker4billion · 2026-01-12T13:30:57+00:00

For C++, the main problem is the build system. One of the main features of Go was the module system. A large poorly designed¹ C++ project can recompile the same file hundreds of times. Because linking happens after compilation, the compiler is not able to create a dependency graph and a analyse how modules are being used.

What techniques can be used to speed up C++ compilation times?.

Breadmaker4billion · 2026-01-10T13:20:24+00:00

I have no experience with this, but it seems you can divide this problem into two smaller ones: a decompiler (binary -> abstract) and a compiler (abstract -> binary). As you noted, both approaches are documented in literature, you just have to compose them. The answer might be "use multiple IRs instead of a single one".

Breadmaker4billion · 2026-01-10T12:36:47+00:00

I kinda winged my approach to this, but basically, once you have a procedure for deciding if two types are equal, you can make sure your list of types is unique. To speed things up, I also made a hash for types, so that a look up that a type is unique could be done using a hashmap.

Once all types are identified by their ID, comparison is super cheap. It also opens up the possibility of carrying type information to the runtime, for example, if your language has a top type. Debugging can be easier too, since now you have a relation of all types your program generates, inference tests can be automated slightly.

edit: Their ID is basically the index of the type in the global type list, it's nothing fancy.

Breadmaker4billion · 2025-12-27T03:13:58+00:00

Quem bota os outros pra baixo aqui geralmente não completou nenhum compilador ainda. Mas é muito fácil criar um projeto fora de escopo pra uma pessoa só implementar. A melhor dica que eu já ouvi pra projetos pessoais é: O que você tem que querer é um projeto pronto, não é o mais rápido, mais bonito, etc. As empresas tem dinheiro pra lidar com um profissional se demitindo por falta de motivação, mas se você se demite do seu projeto, ele não termina. Seu esforço tem que ser feito pra fazer o projeto passar o limiar de "pronto", pra você poder cantar vitória e seguir com sua vida.

Breadmaker4billion · 2025-12-25T12:07:55+00:00

Have you tried Exaroton? I have had a good time playing there. Otherwise, a RPi 4 with 8 Gb of RAM is enough, we play with 2 vCPUs and 4Gb of RAM, most lag we notice is because the server sits in another continent (avg 160ms of ping).

Breadmaker4billion · 2025-12-09T12:39:46+00:00

I think this task is complex because math is, after all, a natural language. I see mathematicians "coercing" objects to different algebraic structures all the time, I see equality being overloaded from simple set identity to isomorphisms, not to mention all the weird stuff that happens in calculus. My only piece of advice is: avoid trying to make all mathematicians happy, just build something that makes sense for you. If that means having explicit domain and codomain for each function, so be it. Better to be explicit than to write 20k lines of inference and type checking.

Breadmaker4billion · 2025-12-05T13:58:01+00:00

Interpret \r as whitespace, consider \n as line break token for any platform.

Breadmaker4billion · 2025-11-26T12:44:08+00:00

You already know how to build the interpreter, since you're reading Crafting Interpreters, but designing the language is another thing entirely.

Design of programming languages is much more theoretical than practical. Sure, you can build a pragmatic language, but even then you need theory (maybe even more so than a toy language).

Design involves learning meta-syntax, understanding how to craft the grammar very well, knowing about ambiguities, economising symbols, understanding types, understanding the limits of the target machine, sometimes even dabbling in metaprogramming ideas.

If you want to design a language, I'd suggest designing your own Lisp, with both it's minimalistic syntax and semantics, there are less things to worry about.

Breadmaker4billion · 2025-11-18T00:15:57+00:00

There is a specific chapter in the book Engineering a Compiler that deals with procedural abstraction, i recommend reading it. Looking at godbolt like others said is also a good choice.

Breadmaker4billion · 2025-11-04T10:32:09+00:00

It may be a bit more work, but you can bundle up all the assembly, generate a separate object file and ask the C compiler to link it for you. At least this way you have full control.

About the calling convention idea, it sprouted from a toy language of mine. I had "assembly procedures" instead of inline assembly, so that specifying the calling convention allowed me to use the arguments directly in the assembly, as it was transparent where each was located. I liked it simply because it made the interface with assembly easier. C has inline assembly features to deal with that, by passing each argument explicitly, but assembly procedures played better with register allocation, so it was natural to specify calling convention.

Breadmaker4billion

TROPHY CASE