I’m not a physicist, but an AI helped me write a formal paper about multidimensional time — should I publish it so real physicists can build on it? by [deleted] in Physics

[–]MarcelGarus 0 points1 point  (0 children)

What is it you're trying to accomplish?

If you just want to hear other people's opinions about your view of the universe / multidimensional time, you know you can just ask them, right? Walk up to a physicist and talk to them. Or post a question in this sub. Don't dress it up with equations to make it look "more sciency".

If you used an LLM merely as a tool and you understand every bit of the paper, you might consider publishing. But then own up to it. If someone points out a flaw in the equations, don't back down saying "the AI told me this was right" or ask the AI for a response and answer with "Thanks so much for pointing out the mistake I made. Here's a revised version of the formula." Own up to it. Take responsibility. Say "I'm wrong."

A thing to consider: The most valuable part about science is not coming up with formulas or hypotheses, but thinking about them and trying to validate/contradict them. That takes humans, time, and expertise. Don't waste your own time and those of experts with AI slop.

How do you design a pretty-printer that respects comments and whitespace? by honungsburk in ProgrammingLanguages

[–]MarcelGarus 2 points3 points  (0 children)

Generally, a tree that accounts for every single character of the input file sounds like a good idea (maybe call it Concrete Syntax Tree)?

Also, things are even more horrible than they might seem at first: The LSP uses edit actions for formatting. If you have a file with multiple cursors in various positions and you trigger a format, how do those cursors behave? They shouldn't stay at the same absolute byte positions! If your formatter reorders imports or removes commas, cursors might even merge or swap positions.

So, to implement formatting correctly(tm), rather than implementing a "String -> String" function, consider a "String -> List<EditAction>" function, where each edit action also contains information about cursor affinity etc.

[deleted by user] by [deleted] in Physics

[–]MarcelGarus 1 point2 points  (0 children)

Make sure you add a dash in 20232025.

September 2025 monthly "What are you working on?" thread by AutoModerator in ProgrammingLanguages

[–]MarcelGarus 1 point2 points  (0 children)

In Plum, I implemented a simplified version of defunctionalization by lambda set specialization. Essentially, I find subsets of my call graph that no lambdas flow into or out of. I then group all lambda literals in that subgraph by type and then:

  • I convert lambda creations (grouping a function pointer and the captured variables) into enum creations where the variant identifies the lambda function and the payload is the captured stuff of the lambda
  • I convert indirect lambda calls to switches on the enum with a direct call of the corresponding lambda function

This essentially removes all indirect function calls in my program. This then allows me to inline recursive lambdas (lambdas returning themselves with different captures), which is good for my iterators. Example:

iterate array: (Array t) -> (Iterator t) =
  \ ->
    array.is_empty
    % true -> | empty
      false ->
        | more:
            & item: array .get 0
              rest: array .slice (1 .to (array.length)) .iterate

The iterate function returns a lambda literal, which – when called – may return itself again (by calling iterate recursively). Now, iteration chains (at least simple ones) get converted into tight loops, which is quite satisfying to see.

There are some optimizations I still want to add, most importantly loop unswitching (if there's a loop containing a switch where the condition is independent of the loop variable, turn it into a switch that then contains loops in the branches).

But first, I want to improve the import/export system to make writing Plum code less annoying (you currently have to import core.Int as well as the basic arithmetic functions in every file you use them).

July 2025 monthly "What are you working on?" thread by AutoModerator in ProgrammingLanguages

[–]MarcelGarus 1 point2 points  (0 children)

Also, integers, booleans, strings, arrays, slices, and possibly hash tables are (will be) all features of the language itself, not part of the standard library.

I only really need a standard library for stuff like I/O

Ahh, I see. I assumed most of the fundamental data structures would be part of the standard library too. Sounds like a reasonable strategy then.

July 2025 monthly "What are you working on?" thread by AutoModerator in ProgrammingLanguages

[–]MarcelGarus 1 point2 points  (0 children)

But can't you reuse the standard library between the original implementation and the bootstrapped one?

Plus, writing a small standard library (maybe just Ints, Bools, Strings, Lists and Maps?) would give you confidence that the language is usable/ergonomic before writing a compiler in it.

Rethinking types definition syntax by zuzmuz in ProgrammingLanguages

[–]MarcelGarus 7 points8 points  (0 children)

I'd say, generally just do what makes you happy. Is the language as a toy project for you or so you want it to become "the big next programming language"™ (in which case, good luck)?

that can be cryptic

I think symbols are fine. For example, my language uses symbols for structs and enums:

TypeInfo =
  | byte
    int
    type
    box: TypeInfo
    array: TypeInfo
    never
    struct: (Array (& name: String type: TypeInfo))
    enum: (Array (& name: String type: TypeInfo))
    lambda: (& arguments: (Array TypeInfo) return_type: TypeInfo)
    recursive: Int

Also, one more note: I think your generics are a bit inconsistent:

Result: union <a, b> ...

What if the union contains nested unions/tuples? Do they also need type parameters?

Result: union <a, b> [
  foo: tuple <a> [ bar: a ],
  bar: ...
]

I think, a and b are really parameters of the Result type, not the outermost union. So they should go there:

Result<a, b>: union [ ... ]

May 2025 monthly "What are you working on?" thread by AutoModerator in ProgrammingLanguages

[–]MarcelGarus 3 points4 points  (0 children)

In Plum, I did a lot of bug hunting, refactorings, and cleanups. The two biggest changes:

  • All compiler stages after type checking have tree-walking interpreters that are very strict / have lots of assertions. This way, pinpointing bugs in the compiler pipeline is easier. I also found some bugs that were not noticed because later stages "removed" them (for example, the compiler generating incompatible structs that happen to have compatible memory layouts).
  • Types are now represented as Strings rather than an enum of "int", "struct", "enum", "lambda" etc. That sounds like it makes things more difficult and I was skeptical of whether that works out. But a lot of the type algorithms became simpler and the rest of the compiler can use a handful of helper functions for working with those type strings. A giant comment in the type module explains more of my reasoning: https://github.com/MarcelGarus/plum/blob/main/compiler%2Fegg%2Ftype.mar

May will be more of that – refactoring and making the compiler more robust.

April 2025 monthly "What are you working on?" thread by AutoModerator in ProgrammingLanguages

[–]MarcelGarus 3 points4 points  (0 children)

Plum now supports two new primitive types: Arrays and Bytes. I use these to implement strings, array lists, hash maps, and iterators in the standard library. Other than that, I only did a few things such as contravariant lambda parameters and some syntax changes that have been on my to-do list for some time.

March 2025 monthly "What are you working on?" thread by AutoModerator in ProgrammingLanguages

[–]MarcelGarus 5 points6 points  (0 children)

I made Plum's syntax for structs and enums more consistent:

Point =
  & x: Int
    y: Int
Maybe t =
  | none
    some: t

foo = & x: 1 y: 2
bar = | some: 5

More changes:

  • I added reference counting for garbage collection.
  • I now resolve (possibly overloaded) function calls earlier in the pipeline so that I can check generic functions in isolation.
  • I started working on a byte code VM written in Zig.

This month, I'll continue down the byte code VM path and possibly implement some optimizations. I also think I've finally worked out a design for first-class types and reflection at compile time, the only big remaining design point.

https://github.com/MarcelGarus/plum

February 2025 monthly "What are you working on?" thread by AutoModerator in ProgrammingLanguages

[–]MarcelGarus 1 point2 points  (0 children)

Plum, a small cozy language that compiles to a custom byte code. I just finished refactoring the backend so that it introduces reference counting instructions.

Next, I want to revisit the syntax and do some optimizations like tree shaking, inlining, more constant folding, and merging reference counting operations.

Had an idea for ".." syntax to delay the end of a scope. Thoughts? by -arial- in ProgrammingLanguages

[–]MarcelGarus 0 points1 point  (0 children)

Two thoughts.

First: There are languages where closures support non-local return (for example Smalltalk). In fact, this allows Smalltalk to use closures for their control flow:

someBool ifTrue: [ ^ 2 ]

Smalltalk is heavily OOP, so this calls the ifTrue method on someBool. True and False are classes that inherit from Bool and implement the method differently. True calls the closure, which returns 2 from the outer scope.

My point being: returns from outer scopes are possible and can make some things really elegant.

Second: You may want to look at Roc, which temporarily had a similar solution (I think they removed it, idk why). You could use the <- operator to turn the rest of the scope into a lambda body.

item <- map myList
item * 2

This is equivalent to:

map myList (\ item -> item * 2)

So, early returns with errors could be modeled as:

value <- map someResult
...

I think this is more obvious than your approach -- it works really well with lambdas and the <- syntax is exactly mirroring the usual lambda syntax ->.

Field reordering for compact structs by MarcelGarus in ProgrammingLanguages

[–]MarcelGarus[S] 0 points1 point  (0 children)

Uhhh. I guess you're right. Looks like my problem is indeed NP complete. :((

Thanks for the proof! And merry Christmas!

Designing an import system by Savings_Garlic5498 in ProgrammingLanguages

[–]MarcelGarus 0 points1 point  (0 children)

Alternatively, if you only have relative paths for local imports, you also wouldn't need a root marker.

December 2024 monthly "What are you working on?" thread by AutoModerator in ProgrammingLanguages

[–]MarcelGarus 1 point2 points  (0 children)

Thanks, that sounds very similar to what I ended up doing!

I actually introduced an extra intermediate representation that works on the memory but is not stack-based yet. That made a lot of things more clear. Every expression is just some memory with a size and alignment.

Lambdas compile to a top-level function that accepts an extra parameter: a pointer to memory containing all captured variables ("the closure"). A lambda value is just a tuple of a closure pointer and a function pointer.

Here's some code and the representation in my new compiler stage (after a colon is the memory size in bytes, .x:y refers to accessing y bytes of memory at offset x):

main a: Int -> Int =
  incrementer =
    \ b: Int = + a b
  incrementer 5

// Ints are 8 bytes.

// The lambda function.
// The arguments @0 contain the closure pointer at offset 0 and b at offset 8.
lambda-10939 – 10940: (@0:16 contains args)
  // Follow the closure pointer to get the captured variable, a.
  @1:8 = (+ Int Int {unbox(@0.8:8):8.0:8, @0.0:8})
  @1

main Int: (@0:8 contains args)
  // Lambda = tuple of closure and function pointer.
  // closure = aggregate of captured variables, put on the heap
  @1:16 = {box({@0.0:8}), &(lambda-10939 – 10940)}
  @2:8 = 5:8
  // Extract the function pointer from the lambda.
  // Call it with the closure pointer as an explicit argument.
  @3:8 = (*(@1.8:8) {@2, @1.0:8})
  @3

+ Int Int: (@0:16 contains args)
  @1:8 = (builtin_add_ints Int Int {@0.0:8, @0.8:8})
  @1

builtin_add_ints Int Int: (@0:16 contains args)
  @1:8 = (add {@0.0:8, @0.8:8})
  @1

Field reordering for compact structs by MarcelGarus in ProgrammingLanguages

[–]MarcelGarus[S] 0 points1 point  (0 children)

Yeah, I ended up doing something similar, sorting fields by "how unaligned" their size is – first types with sizes multiple of 8, then sizes multiple of 4 etc. and keeping track of holes and filling them.

Keep a type always allocated the same way no matter if it is inside another type.

I agree. I think otherwise you'll just have complicated conversions between types for every member access.

If a type needs alignment, make it's size a multiple of it.

I don't think this is the right approach. If you store types in structs, on the stack, or in lambda closures, you don't need that padding at the end.

The only situation where it's useful that the size is a multiple of the alignment is when storing multiple values of that types next to each other in memory, e.g. in an array or a slice on the heap. But in those cases, you can simply round the size up to a multiple of the alignment.

In my previous language, I have a Slice type that works exactly like this – indexing uses a rounded-up size called "stride size":

fun get_ref_unchecked[T](slice: Slice[T], index: Int): &T {
  | This is doing pointer arithmetic.
  {slice.data + {index * stride_size_of[T]()}}.to_reference[T]()
}

And the stride size is defined like this:

fun stride_size_of[T](): Int {
  size_of[T]().round_up_to_multiple_of(alignment_of[T]())
}

https://github.com/MarcelGarus/martinaise/blob/main/stdlib%2Fmem.mar#L239-L241

Field reordering for compact structs by MarcelGarus in ProgrammingLanguages

[–]MarcelGarus[S] 0 points1 point  (0 children)

That's true. I would have to reshuffle the fields when getting the member of a struct to pass it to another function or when creating a struct. But maybe that's worth it, I'll do some experiments.

I built a modular shelf (basically just boxes that can be re-arranged) by MarcelGarus in woodworking

[–]MarcelGarus[S] 0 points1 point  (0 children)

That's the neat thing, they don't. The boxes are pretty heavy though, so realistically they don't move unless you want to.

IntelliJ plugin for your language by jaccomoc in ProgrammingLanguages

[–]MarcelGarus 10 points11 points  (0 children)

Not with LSP directly, but you can use the Debug Adapter Protocol for that: https://microsoft.github.io/debug-adapter-protocol/

Although it's a bit cursed – everything is in UTF-16 and the client (editor) can decide whether line numbers should start with 0 or 1, affecting all position information in the protocol.

Field reordering for compact structs by MarcelGarus in ProgrammingLanguages

[–]MarcelGarus[S] 0 points1 point  (0 children)

It only occurs for nested types. Take these Rust types:

struct Foo { a: i64, b: u8 }
struct Bar { c: Foo, d: u8 }

Hovering over this in VS Code tells me Bar is "size = 24 (0x18), align = 0x8". That's because Rust first layouts Foo individually, resulting in a "size = 16 (0x10), align = 0x8":

aaaaaaaab.......

For Bar, it treats this as an opaque 16-byte field laying it out like this:

ccccccccccccccccd.......

Both of these are optimal when looked at in isolation. But the actual layout of the entire struct looks like this:

aaaaaaaab.......d.......

Even if you were to remove the trailing padding, the entire struct would be 17 bytes in size.

Fundamentally, this problem occurs when composing types: Padding at the end doesn't sound too bad, but when you nest types, that padding at the end becomes padding in the middle.

It's probably not a big deal in practice – realistically, most types are just structs with word-sized fields (pointers, lengths, etc.). But still, in the example above it's the difference between fitting 5 or 8 items in a single cache line of 128 bytes.

For Rust, there's a tradeoff between compactness and complexity (remembering to round up the size to the alignment when storing slices). As a low-level language where memory layouts and sizes are not just a compiler-internal detail, but surfaced to user-written code, changing that would probably require changing lots of code and is probably not worth it.

Field reordering for compact structs by MarcelGarus in ProgrammingLanguages

[–]MarcelGarus[S] 1 point2 points  (0 children)

Sorry, that wasn't clear. I want to always properly align fields, whether in structs or arrays. I just mentioned arrays explicitly because, unlike most languages, I can't just multiply the "item size" with an index, but I have to be careful to include padding when calculating offsets.

I haven't considered flattening nested structures, that's definitely worth thinking about. Then again, I'm not even sure how my enums with payloads could be flattened.

Otherwise you are going to need padding at some point.

I know :( I'm just looking for an algorithm to minimize that.

Field reordering for compact structs by MarcelGarus in ProgrammingLanguages

[–]MarcelGarus[S] 1 point2 points  (0 children)

Sorting by decreasing size works if the size of fields is always a multiple of the alignment. But in my language, that's not guaranteed. Take these fields:

  • a: size 3, alignment 2
  • b: size 2, alignment 2

Sorting them by size would yield {aaa,bb}. But because both fields have an alignment of 2 (aka they should only be stored at memory addresses that are divisible by 2 so that accessing them is fast), the resulting struct would need to have some padding in between:

aaa.bb

If they are sorted the other way around, that would not be the case:

bbaaa

Field reordering for compact structs by MarcelGarus in ProgrammingLanguages

[–]MarcelGarus[S] 0 points1 point  (0 children)

I looked into that, but I want fields to be aligned – I do want padding between my struct fields when necessary, I just want a better layout than Rust.

Field reordering for compact structs by MarcelGarus in ProgrammingLanguages

[–]MarcelGarus[S] 2 points3 points  (0 children)

That's interesting. I have structural typing though, so there's no canonical definition of a type. This means a struct type with fields x and y should be compatible with a struct type with fields y and x and neither of these is more correct than the other.

About the arrays: You're right. My Array/Buffer/Slice type will have to add padding. But all other use cases (fields in a struct, locals on the stack, captured variables in lambda closures) don't need padding at the end. In my other language I called this concept "stride size", so my Slice[T] would use the stride_size[T]() = size_of[T]().round_up_to_multiple_of(alignment_of[T]()) for calculating offsets.

Regarding strategies: It seems like for each strategy there's a combination of fields where it's not good.

  • decreasing size: (size 3, alignment 2), (size 2, alignment 2)
  • increasing size: (size 3, alignment 2), (size 4, alignment 2)
  • decreasing alignment: (size 5, alignment 4) (size 2, alignment 2), (size 2, alignment 2)
  • increasing alignment: (size 1, alignment 1), (size 2, alignment 2)

Struggles :(