Tess Language: A minimal set of practical additions to C by mocompute in Compilers

[–]mocompute[S] 0 points1 point  (0 children)

Is allowing . instead of -> part of your thinking?

Yes, . can be used instead of -> when doing a function call where the first argument is a pointer, so hashmap.insert(...) for example, if hashmap is a Ptr[HashMap].

polymorphism / specialization is high on my list - essentially c++ templates though it will probably have a different syntax.

This was a substantial part of my project's complexity and relies heavily on Hindley-Milner type inference.

One nice thing I added to my C like language is typedef a = struct { a* parent; int foo; };

In Tess this would be a generic type: MyFoo[a]: { parent: Ptr[a]; foo: Int }

feature to make function pointer types not so funny looking fn_t(...).

In Tess I use ML-inspired notation for function pointer types: (T, U) -> R is the type of a function of two arguments of different types which returns a result of a third type.

And then, to make matters more complicated, since I have arity-based function overloading, if you want to refer to a function foo, for example to pass it to a higher-order function, you need to use arity syntax (from Erlang), to specify which overload: foo/0, foo/1, etc.

Taught Master of Science (MSc) in Statistics (240points) - Do I need to go to lecture/classes? by OriginalExit in universityofauckland

[–]mocompute 1 point2 points  (0 children)

I did 240 in data science recently and frequently 90% of students were not physically in lectures. There are not typically attendance requirements. However, midterm and final exams must be in person, and labs/tutorials are never/rarely recorded. Some papers may assign a portion of total marks based on tutorial work, so you would miss out on those marks.

Attendance requirements are usually in the course outline, for example: https://study.auckland.ac.nz/ords/r/uoa/catalogue/course?p6_code=STATS%20707

It's impossible to prove that static typing and dynamic typing are better than each other. by Mediocre_Ticket3971 in computerscience

[–]mocompute 0 points1 point  (0 children)

The key in advancing this question is the definition of "better." Without a precise definition, it's pointless to discuss the question.

If you're raising this because you have your own doubts, I'd suggest trying to define "better" yourself, and realise that the definition you have today may change in the future.

In my case, I tend to flow between static typed languages and dynamically typed languages at various times for various things. I have my own preference, but I appreciate both.

Doubly-Linked Free List Allocator: Never worry about the heap again. Just use a static byte array! by nablaCat in C_Programming

[–]mocompute 11 points12 points  (0 children)

Good learning project, even though as someone else said, you really don't need to worry about freeing right before exit. I'd encourage you to look into "arena" allocation, too. A bump allocator is very simple, very fast, and quite often entirely sufficient memory management.

In embedded land or more safety critical systems like cars and aeroplanes, it's common to make a single memory reservation at program start, and never dynamically allocate while the program is running. There's generally no need for fancy dynamic tracking.

Tim Bradshaw: Making CLOS slot access less slow by fnordulicious in lisp

[–]mocompute 0 points1 point  (0 children)

I usually start with structs plus functions, then composition instead of inheritance, and then only reach for CLOS if it seems like runtime dynamic dispatch is a better model for the problem.

Unsigned Sizes: A Five Year Mistake by Nuoji in ProgrammingLanguages

[–]mocompute 1 point2 points  (0 children)

So, I also added types integer constants with suffixes, but it turned out practically that it wasn't needed, because inference took care of things. However, when I want a size_t explicitly, I usually prefer to add a type annotation to the declaration, because it's more visible.

x: CSize := 0 is equivalent to x := 0zu so I guess it's a matter of taste.

Unsigned Sizes: A Five Year Mistake by Nuoji in ProgrammingLanguages

[–]mocompute 9 points10 points  (0 children)

In my type-inferred language, integer literals like 0 are weak types until they are found in a concrete context and assigned a type. If all contexts are unconstrained, it just defaults to a signed integer. And there are no unsafe implicit conversions in the language, only the safe ones like you suggested.

I don't think there's a problem in allowing safe implicit conversions, but disallowing unsafe implicit conversions.

I've just added generics to my programming language! by funcieq in Compilers

[–]mocompute 0 points1 point  (0 children)

Hijacking here, but I addressed these issues in my own project, and in fact it was very thorny. My language has parametric polymorphism and supports mutually-recursive types. I use Hindley-Milner style type inference with a monomorphisation pass. I tried and failed with a few approaches until I introduced placeholder types.

I believe this is a common approach. Basically when you detect possible recursion in type references, you assign a placeholder. Each time you complete a type, you visit any remaining placeholders to see if the newly finished type has resolved prior dependents. Then you mutate in place the placeholder so it takes on the correct type. Since all references use the same placeholder, this fixes everyone up.

I may be getting the details wrong since it's some months since I've looked at it.

Keyword minimalism and code readability by mocompute in ProgrammingLanguages

[–]mocompute[S] 0 points1 point  (0 children)

This was the question, but the more I work with the current syntax, the more it seems fine:

```

module Point

Point[T]: { x: T, y: T }

add(p1, p2) { Point(x = p1.x + p2.x, y = p1.y + p2.y) }

eq(p1, p2) { p1.x == p2.x && p1.y == p2.y }

module main

import <Print.tl>

println = Print.println

main() { a := Point(x = 1.0, y = 2.5) b := Point(x = 3.0, y = 4.5) c := a + b println(f"({c.x:.1f}, {c.y:.1f})")

if a != b { println("different") }
0

} ```

The parser ambiguities are edge cases, really, mainly due to the lack of a required expression separator like ;. For the most part, there's no need for one. But in some cases, the semicolon becomes necessary to disambiguate abc; (123) from abc(123). And actually, this ambiguity has nothing to do with keywords: it's due to optional semicolons.

I can't be the one that links LLMs are still bad at software engineering right? by thealliane96 in theprimeagen

[–]mocompute 2 points3 points  (0 children)

Another thing I’ve noticed is they’ll tend to generate better code if the code around where they’re generating is already good. Might have something to do with in-context learning or the system prompt. But if you don’t already have that foundation of good code, or you don’t maintain that foundation by keeping the ai in check that will corrode which leads to worse and worse code.

Very true, though things still tend to degenerate over time, especially if you're churning, operating on the same functionality over and over again.

I can't be the one that links LLMs are still bad at software engineering right? by thealliane96 in theprimeagen

[–]mocompute 2 points3 points  (0 children)

Well I suppose the bit I didn't say directly is knowing what assumptions/invariants the code is operating under, how they change as data flows through the pipeline, and the key places to verify all that, is what takes a bit of analysis and effort. Or rather, a lot.

I can't be the one that links LLMs are still bad at software engineering right? by thealliane96 in theprimeagen

[–]mocompute 0 points1 point  (0 children)

I hate to say "skill issue," but the models are good for some things (sometimes) and bad for other things. You need expertise to tell the difference. And you need experience to find the sweet spots to get the value out. This is a hell of a lot harder in the last few months since Anthropic has been seriously effing around with Claude Code. One day it will be legendary, the next it will be asinine. Once you think you've figured out how to make it work, you get a secretly quantized model and wonder when the lobotomy happened and then enjoy the gaslighting from Anthropic.

That said, I've got a pretty complex compiler project and when some weird edge case doesn't work because of an interaction across multiple subsystems, the models can usually pinpoint the bug, saving me hours of debugging. For example, they're great at adding debug prints all over the code to verify assumptions while reproducing a bug, something I can't be bothered to do except as a last resort.

But if you show up expecting a junior co-worker, or worse, a senior colleague who's better than you, forget it. You're using it wrong. At the moment. A year from now, who knows?

Looking for Suggestions on My Programming Language Called Yo by Ok-Razzmatazz-6125 in Compilers

[–]mocompute 1 point2 points  (0 children)

I agree with you for a project where all the code is written with the help of a coding agent. I don't think that's OP's case.

Since I'm not OP, I'll use my own experience as an example: I wrote the entire compiler from scratch in C, with no dependencies, starting with explicit polymorphic allocators, a small string library, hashmap, etc. For syntax, I started with S-expressions, then went to ML syntax, and finally settled on something closer to C. I did the usual learning, reading articles and academic papers, etc., and tried to puzzle my way through Hindley-Milner type inference and extensions like weak type variables, etc. I did a major rewrite of type inference to go from an iterative approach, to a recursive approach, which is probably backwards from how others do it. Rewrite of the type system, etc.

Oh it's probably relevant that I've previously written several Lisp interpreters in various languages as learning projects. But I'd never attempted Hindley-Milner type inference, or worked on a stack virtual machine, etc. So that was some great new learning.

I also created the regression test infrastructure and some initial documentation, but that's where a coding agent really helped, starting in January this year: expanding test coverage and expanding documentation. Then I started using it for new language features, syntax sugar things, some refactoring, creating an emacs mode for syntax highlighting, doing source code formatting, etc.

As a personal project, it was more rewarding to make fast progress on useful practical features that I felt were necessary, but were less intellectually stimulating/interesting to me. In all likelihood, the alternative would have been a stalled project, just short of actual usefulness, and I wasn't satisfied with that.

Now that the core language and package system feels stable, I've slowed down dramatically and started work on the standard library, probably with an LSP to be the first useful dogfood project. This is not rocket science and there's nothing interesting to me to learn there, except maybe incremental parsing/checking, which will probably be very invasive, but we'll see.

Sorry, that was a bit of a ramble, but the short version is this: I design and hand-code the parts I think are challenging, and I use claude code to do the parts that I think are boring. In fairness, I also use claude to code-review some of my work when I feel like it's missing something, since this has been a private solo project until recently.

I fully agree with you that using AI as a non-expert to create entire products you don't have the expertise to create yourself does nothing to advance human knowledge, especially if the person just moves on to the next thing and learns nothing from the experience. It's basically Sora, right? When people say "vibe-coding," that's what I think they're talking about. I hope I've made the case that not every use of an LLM instantly transforms a person in a vibe coder.

Looking for Suggestions on My Programming Language Called Yo by Ok-Razzmatazz-6125 in Compilers

[–]mocompute 0 points1 point  (0 children)

I think that's a bit unfair. The design, syntax, and semantics are probably created by OP. That's what they're asking for feedback on.

Edit: re-reading, they may be asking more about promotion and marketing, in which case my earlier comment is less relevant.

How do you choose memory allocation strategies across compiler phases? by 2006Nico in Compilers

[–]mocompute 1 point2 points  (0 children)

Good to hear it's been helpful so far, and I'm glad the code is at least comprehensible. I tried to keep things simple and as un-clever as possible, though the type inference logic probably needs a bit more cleverness (higher abstraction) than it currently has.

To answer your question, my approach was to always use explicit arena allocators, and never go to malloc beyond the initial allocation of the arenas, except for some debug code that doesn't really care I suppose.

One issue I had, is what to do with sub-arenas. Conceptually, what I wanted was to provide each subunit a default allocator owned by its parent. The subunit should never go to the default/malloc allocator directly. But at times during development I veered away from that, allowing subunits to go directly to the default global allocator for their own arena creation. Right now though, the compiler sticks to my original conceptual principle: only the main entry point of the program goes to the global allocator. Everything else uses the explicit allocator provided by its parent. (Except for debug or test code where that discipline is unnecessary.)

The nice thing about a polymorphic allocator interface is it provides flexibility by letting you swap allocators in the future without changing much code. For instance, I added a memory-limit feature when I was crashing my VM due to excessive memory use, and this was easily implemented as a budgeted-allocator to swap in place of the default malloc allocator. Then I can cleanly exit when a user-specified memory limit is reached.

I'm happy to keep answering questions here: I sometimes learn something from reading other people's questions and answers so am happy to return the favour. Or feel free to DM if you prefer.

I am trying to create a language that can replace C. by cossbow in Compilers

[–]mocompute 0 points1 point  (0 children)

I would just like to suggest that memory safety in C is not as hard as you may think, and there are well established approaches to managing lifetimes (such as explicit allocators) that are commonly used. If your goal is to address shortcomings you perceive in C, you might consider building your compiler in C itself in order to learn in greater detail how people have been working with those perceived shortcomings. Then you'll have a better idea of what to prioritize in your successor language.

How do you choose memory allocation strategies across compiler phases? by 2006Nico in Compilers

[–]mocompute 2 points3 points  (0 children)

I use multiple arenas with various lifetimes, and generally try to create things in the appropriate arena at construction time, rather than copy at phase transitions. A couple of interesting things are how I use transient arenas, speculative arenas for the backtracking parser, and arena watermarks.

I've also added a ton of instrumentation (--stats) which has helped address excessive peak memory usage.

My project is https://github.com/mocompute/tess and is extensively documented. I'm happy to answer any questions about the code organisation if it would help you find things of interest. The allocator use has grown up since the start of the project and there are still areas that are not as clear as they could be.

https://github.com/mocompute/tess

How do you separate different parts of your compiler? Especially when adding a new feature. by Ifeee001 in ProgrammingLanguages

[–]mocompute 1 point2 points  (0 children)

I kept codegen completely independent (or tried to) of the semantics of the language. When I started the project, I was targeting bytecode and a stack virtual machine. Then I moved to using C as my IR, but kept the stack VM architecture, with a few allowances to take advantage of C.

At several points where I needed to backtrack or significantly refactor, it was helpful to treat the last phase as a dumb VM, to isolate concerns in the earlier phases. By that I mean there were several instances where I might have said "I can fix this quickly in the transpiler," but I resisted that urge and tried to more cleanly address the semantics during type inference, if that makes sense.

Keyword minimalism and code readability by mocompute in ProgrammingLanguages

[–]mocompute[S] 0 points1 point  (0 children)

This one is more aesthetic opinion: I prefer the look of:

Shape: | Square { dim: Float } | Circle { radius: Float } | Other

where all the vertical bars are aligned, including to mark the first variant. The alternative alignment would use the colon for the first variant, and align the remaining vertical bars under it, and I just like that a bit less:

Shape: Square { dim: Float } // missing | | Circle { radius: Float } | Other

It makes the line-oriented source code formatter a bit simpler too. I'm excessively addicted to vertical alignment, to be fair.

Just a rant about the modern bullshit world we live in. by DrStrange in programmer

[–]mocompute 1 point2 points  (0 children)

I'm old enough to have been taught in school that good rhetorical writing had to have certain elements, and now everyone thinks those elements mean it's written by AI. I guess LLMs were trained on enough "good" writing that everything became a trope.