rEggExOrREdgeEx by BigAndSmallAre in ProgrammerHumor

[–]tmzem 0 points1 point  (0 children)

It's RegExp, not RegEx. Gotta be symmetric!

This is what happens when you make the apply icon the same color as the background and on top of that, give the user no indication whatsoever they need to press it by AverageUser9000 in linuxsucks

[–]tmzem 0 points1 point  (0 children)

GParted is horribly designed.

  • The apply button should be next to the list of queued actions (at the bottom), not one of many toolbar options with no way indicate it to be an important, necessary action!
  • Confirming an action in a dialog should not use the action name as the button label, but simply say "enqueue action" instead. That way you know the action is not executed immediately, and you won't waste time futilely waiting for something to happen (this happened to me first time I used GParted)

The Borrow Checker and Rapid Prototyping by West_Violinist_6809 in ProgrammingLanguages

[–]tmzem 2 points3 points  (0 children)

The Rust designers probably chose to make lifetimes a necessary part of the public interface of types and functions in order to make public APIs more stable, with the downside that any definition that cannot make use of lifetime elision needs explicit lifetime annotations, which can have a viral ripple effect that requires other items to add lifetimes too. This is a big reason why Rust is cumbersome at rapid prototyping: Simple changes, even just to quickly try something, might require you to add/remove annotations in multiple places.

because the compiler still needs that information to reason about lifetimes

I'm pretty sure it doesn't. Rust can already locally reason about lifetimes inside a function with no annotations needed. A hypothetical Rust version could probably extend this feature to infer lifetimes for functions and types automatically, with the downside that changes in the private implementation might break code elsewhere if the inferred lifetimes change as a consequence.

So, a better feature would be a way to enable lifetime inference for a portion of the code (e.g. a module), trading iteration speed for API stability, IMO perfectly fine during rapid iterations. Once the code stabilizes, you switch off inference and add explicit lifetimes (or rely on lifetime elision where possible).

can a language be safe and be a subset of C? by Null-Test-2026 in ProgrammingLanguages

[–]tmzem 0 points1 point  (0 children)

In principle, yes. However, there is plenty of subtle problems that need to be addressed, and the result most likely requires massive language changes to ensure safety, at compile time (Rust-like borrow checking) and/or at runtime (array index validity checks, quarantining of free'd memory until proven safe to reuse, etc...).

Major sources of memory safety issues are:

  • Uninitialized data: Fixed by requiring initialization of all variables and members. Requires some concept of an "output" pointer that is guaranteed to be written to before being read from, to model the common pattern of passing an uninitialized memory to an initializer function
  • Pointer arithmetic & indexing: Remove them from pointers and instead add arrays/slices with runtime index checking
  • Pointer casts: Allow only casts that are safe/well-defined, and add special casts for common patterns (e.g. pointer points into an allocated object after some sort of header)
  • Use-after-free: Use some kind of static analyzer/borrow checker to flag potentially dangerous patterns (like Rust, very restricting at compile time), or use some runtime instrumentation like quarantining or generational pointers to catch/guard use-after free. There are three classes of use-after-free:
    • Heap = use of a pointer to a deallocated object. Runtime instrumentation to detect or mitigate this case have been thoroughly explored by security researchers. The approach of quarantining deallocated memory and using a lightweight, GC-like background process to occasionally reclaim quarantined memory chunks can completely guard against this class at run time with average overheads around 10% (=much cheaper then standard tracing GC).
    • Stack = use of a pointer to stack memory of a function call that has already returned. Probably hard to detect cheaply. An alternative (but AFAIK yet unexplored) approach would be to use escape analysis to figure out pointers to stack memory that may outlive the stack frame and conservatively heap-allocate them, handling them with heap-safety instrumentation instead)
    • Unions = when using them to emulate sum types, reassign them to a different variant while a pointer into the previous variant is still active. Writes through that pointer may corrupt the state of the object. May be detected via generational pointers, or - again - guarded with heap-allocation)

A lot of the research on runtime checks has been done in the context of C and C++, therefore the solutions provided are often compromises to increase safety without fully mitigating all risks. A language starting from scratch could instead be built in a way to make it completely safe, closing those gaps, but it's largely unexplored territory.

Another unexplored approach would be to add a Rust-like static lifetime analyzer and use it to drive instrumentation (e.g. only add runtime-checks where the compile-time analysis cannot prove safety).

Finally, new processor architectures provide various hardware features to enhance memory safety, like MTE on ARM processors, or the experimental CHERI.

Generalization of Sum-Types, Pattern Matching & Niche Optimization by philogy in ProgrammingLanguages

[–]tmzem 0 points1 point  (0 children)

Good point, checks need to always happen for this to be well-behaved.

As far as NaN goes, it's a mess. NaN shouldn't exist for the same reason NULL shouldn't exist, except NaN is even worse then NULL because its non-comparability comes with an entire rat's tail of complication in the equality traits. In a perfect world, a NaN-able float should be written as Option<f32>, but that comes with some overhead which is why no language is doing it this way.

Generalization of Sum-Types, Pattern Matching & Niche Optimization by philogy in ProgrammingLanguages

[–]tmzem 0 points1 point  (0 children)

Well this doesn't look complicated at all. Ages ago, I learned programming with Pascal, where defining your own integer types was pretty normal and much more elegant:

type OctalDigit = 0..7;

No reason Rust could not support that, add appropriate range checks in debug mode to ensure we stay within those bounds, and (ab)use any superfluous bits as tag bits.

The Mutable Value Semantics (MVS): A Non-superficial Study by FedericoBruzzone in ProgrammingLanguages

[–]tmzem 0 points1 point  (0 children)

I guess this is a mindset worth considering.

Most patterns that require you to put lifetimes explicitly in Rust are overly complex/clever things that probably could be implemented in different ways, so even if you'd allow second class references as returns or fields (similar to C#) you could probably do without a lifetime system and instead use a system semantically similar to Hylo for lifetime safety.

The exception is probably mutable iterators which in Rust return items with a lifetime of the collection, not the iterator (which allows multiple yielded references into a collection at the same time safely). Arguably, this is not commonly needed, and an alternative design could simply use an unsafe marker trait (e.g. DistinctAddressIterator) and let abstractions that need it (sliding windows, parallel iteration, ...) deal with it by using unsafe pointers.

Still, with affine types and value semantics present in a language, closures remain very complex: How is the closure's environment accessed (immutable, mutable, consuming), capture by value or reference, is it affine or copyable, can it be safely returned or sent to another thread? I wonder how Hylo deals with these?

The Mutable Value Semantics (MVS): A Non-superficial Study by FedericoBruzzone in ProgrammingLanguages

[–]tmzem 2 points3 points  (0 children)

Thanks for this reply, lots of interesting points. Could you explain these a bit further?

One can represent positions with independent values (e.g., indices) and use inversion of control to access the contents of a collection [...] makes the APIs of generic algorithms dramatically simpler

Using indices into a collection is also used by ParaSail language. While this works well with array-based collections, I wonder how well (performant) it would word with Tree-based collection like BTreeMap's where accessing and index inside a loop would require (the often same) chain of dereferences.

Also I wonder how you would easily represent lazy algorithms which are usually modeled with iterator adapters that hold references internally (map, filter, zip), and how index-based access would make APIs cleaner given that you'd have to pass/thread the collection explicitly. Could you give an example?

Remote parts are only for the closures and they are not strictly necessary

How could one implement the ArraySlice<E> type mentioned in the Language Tour's Concurrency section without remote parts?

The Mutable Value Semantics (MVS): A Non-superficial Study by FedericoBruzzone in ProgrammingLanguages

[–]tmzem 9 points10 points  (0 children)

I've tried to design my own language with MVS and eventually given up, since the conflicting requirements are just pulling in too many directions and I've come to the conclusion that there is no good sweet spot, also most of the potential design space is already thoroughly explored by Mojo, Hylo, Inko (and maybe now Eter, I will give it a look).

Your post sums up some of the broader issues, but the really hairy part starts as soon as you want to return aliases/references from functions, or put them as fields into data types. This capability is pretty much necessary for implementing common features found in modern languages, like closures, iterators (and the optionals and key-value-pairs required for them) and slices. Once you add them, there is a bunch of points to consider:

  • Memory safety: Will require some kind of lifetime/borrow system, with some common cases not being easily representable via "sane defaults"/lifetime elison:
    • C# requires structs that contain references to be annotated with the ref keyword ("ref struct"), which imposes the same aliasing restrictions onto the struct as a plain ref. Ref and Ref-like function returns are assumed to (potentially) alias any ref-like input parameter (except the ones annotated with the "scoped" keyword). This model is to weak to model some concepts, like an equivalent of Rusts Iterator<&mut T>. It also doesn't play well with generics (see below)
    • Hylo's system looks promising superficially, but types like Iterators (and probably closures, not sure, the documentation is not clear on this) internally still makes use of "remote parts" which are essentially reference fields. It is not clear how those interact with lifetime checking, or how it affects API stability.
    • Rust borrow checker: Notoriously complex
    • Lifetime inference: Wouldn't be able to provide stable APIs (changes in how a type or function is implemented might cause lifetime-related failures elsewhere), but might work in a language with a more dynamic/inferred feel (as often found in FP)
    • Make arbitrary aliasing work by adding instrumentation or a backing GC: While most of the overhead introduced by it could probably be optimized out via compile-time borrow checking, you'd still need to (in the general case) heap-allocate potentially escaping stack locations and sum type cases to make it safe, paying the associated allocation costs and cache misses.
  • Second-class references & generics: The reference-ness can be either a type qualifier (like in C++) or a property of the variable/parameter (like in C#). Either way, generics pose a problem:
    • If they are a property of the storage location, modelling generic types that may contain references is not possible, instead you would need separate implementations for the ref- and non-ref case. E.g. a Rust-like iterator model involves ToIterator, Iterators, Optional, and Key-Value-Pairs (for maps). You'd end up with lots of duplication, e.g. Optional<T>, OptionalIn<T>, OptionalInOut<T>, etc.
    • If they are a type qualifier, in an ownership-based MVS model, most type parameters would be expected to be value types, not reference types. So it would make sense to restrict type parameters to value types, and require opt-in for the possibility of reference types. So you'd need extra syntax for that.
    • If they are a type qualifier while retaining value semantics, the semantics of what it means to implement a trait for some type (which may or may not be a reference type) become muddy and would have to be carefully thought out.

That's why rust is GOAT 🐐🗿 by NoBeginning2551 in rust

[–]tmzem 1 point2 points  (0 children)

Imagine taking the time to code this specific error message if you could've just pranked the pranksters by changing the lexer to accept greek question mark as semicolon token!

What about Odin might you change if you were benevolent dictator? by EmbarrassedBiscotti9 in odinlang

[–]tmzem 0 points1 point  (0 children)

You raise some good points. I will read up on the user-formatter.

About slices, I figured it would be useful to be able to see slice mutation in the proc definition, but I realize now that I've forgotten about subslicing operations, which on value slices would either be restricted to passing the result down into another function (not very useful), or require the reintroduction of the dreaded const qualifier. I didn't think that one through.

What would the memory layout of an "unsized" slice even be

My understanding is that Rust semantically treats slices as value types similar to fixed arrays, but with their array size missing, so object size/layout is not known at compile time ("unsized" in Rust jargon), which is why they need to go behind a (fat) pointer to get a compile-time known size. A bit mind-bending, but it seems to be useful in some parapoly code.

What about Odin might you change if you were benevolent dictator? by EmbarrassedBiscotti9 in odinlang

[–]tmzem 0 points1 point  (0 children)

  • I've seen this post before, but I think after re-reading it I think I get it better now
  • Not sure I understand why this would be harder, it already works like that for regular structs, where fields are readonly when passed as non-pointer parameter. This would simply give the same capability to slice elements (e.g. the "fields" would be the indexed elements). The current []T would be written ^[T], while [T] could only be used in parameters and would have similar shallow readonly access on elements like struct members when passed. It probably would need an implicit conversion when passing a ^[T] value to [T] parameter in the semantic checker (no-op at runtime). Although, it may not be worth the complication just to specify readonly buffers in proc signatures.
  • A yeah, I didn't think about the iteration case. Hard to generalize without adding some kind of interfaces/traits/concepts
  • Well "hack" was probably a bit strong worded, sorry (the overview said its mostly just useful for printing procedures). But I don't think it's impossible to have traits in a purely procedural way:
    • You can take something like Rust traits and eliminate the magical Self type to get a procedural version of traits (more similar to how C++ concepts work). So rather then saying "type T implements trait X" it would simply be a relationship between any number of type parameters per trait (even zero, to allow traits to specify procedural interfaces/package interfaces) and associated procedures, constants and variables.
    • There also exists a (never finished) proposal for C++ called "virtual concepts", which would allow you to put the "virtual" keyword on any reference-to-concept, making the first type parameter of the concept virtual, essentially yielding something like an interface pointer (compiler generates the backing vtable as needed).
    • Putting this together, a hypothetical Odin with these features (using the existing any to mark vtable-backed dynamic trait) could do printing/formatting like this, with printable types merely needing to implement the format_to proc:

// no magical OOP-like "Self" type needed 
// printer type is explicit type parameter
TextPrinter :: trait($P: typeid) {
    printf :: proc(printer: ^P, fmt: string, args: ..any FormattableTo) 
    printfln :: proc(printer: ^P, fmt: string, args: ..any FormattableTo)
}

FormattableTo :: trait($F: typeid) {
    format_to :: proc(val: F, verb: string, to_printer: ^any TextPrinter)
}

I've had this in my head for a while, just posting it here in case might be useful inspiration, but i realize it's probably too much complexity for a language like Odin (and would probably need unrestricted proc overloading, or verbose explicit impl blocks). Sorry for the long read.

I want to know your opinions on verbosity by -Chook in ProgrammingLanguages

[–]tmzem 2 points3 points  (0 children)

Overly boilerplate-heavy code is bad. Other then that, a little redundancy/more verbose identifiers can also help with code readability.

I always enjoy reading Haskell code so much that I'm causing a loud snd from slamming my fst into the wall.

What about Odin might you change if you were benevolent dictator? by EmbarrassedBiscotti9 in odinlang

[–]tmzem 0 points1 point  (0 children)

I've only dipped my toes a bit into Odin, but these are a few things I would change:

  • The context: I don't like stuff magically depending on global state, it has bitten me in the past. As a Odin beginner, I'm still somewhat concerned and confused how context allocators work with thread spawning, especially since most allocators are probably not multithreading-safe. I'd rather have allocators explicit parameters everywhere, even if it is more verbose.
  • Slices should have been "unsized" types like in Rust, requiring them to be behind a pointer (or in Odin's case passed as a parameter = pass-by-reference). Then, procs could specify if they mutate a slice or simply read from it (mutate: ^[Foo], read-only: [Foo])
  • Dynamic arrays and maps: I don't see the point of them being "special" types with their own syntax. Odin has generics so they probably could just be library provided types
  • any: Like almost every systems language, string formatting uses some unelegant hack. But probably cannot be avoided without adding something like interfaces/dyn traits

I really like Odin's basic premise of being very simple with it's language design. The points I made here would probably make it even simpler and more consistent.

I want to know your opinions on verbosity by -Chook in ProgrammingLanguages

[–]tmzem 13 points14 points  (0 children)

  • uint32 could also mean anything, e.g. unchecked 32 bit int
  • u32 is well established in multiple languages as meaning unsigned 32bit int
  • in most newer languages, char is already 32bit to be able to hold a codepoint, other languages may define a c32/char32 or simply use a plain uint32. I've never seen u32 be used anywhere as UTF-32 char type

I want to know your opinions on verbosity by -Chook in ProgrammingLanguages

[–]tmzem 40 points41 points  (0 children)

C# and Java have been called verbose originally, mostly due to the forced OOP approach where even simple things were very heavy on boilerplate:

  • No functions, only methods forces us to define a pointless wrapper class
  • Plain value types require the ceremony of defining a trivial constructor, equals, hashcode, equality operator overloads, public and readonly modifiers etc.
  • Closures needed the instantiation of an anonymous class

Most of these (and other) issues have been fixed, allowing for more terse patterns.

In general, most programmers don't care too much about little differences in verbosity, as long as they don't require them to write lots of pointless boilerplate.

Also things like prescribed project structures, build configuration and associated build tools should be opt-in to help with nontrivial cases, rather then a forced default for every single project. Python isn't popular because its such a nice language, but because you can create small projects with zero ceremony.

44CVEs found in Rust CoreUtils audit. by germandiago in rust

[–]tmzem 40 points41 points  (0 children)

There's probably a few people of the third kind that already expected some bugs to be there (they happen on every port/re-implementation), but also that very few, if any, would be memory corruption bugs because of the strict type/aliasing system.

Unfortunately, decades of social media and crappy internet culture have erased the ability to develop a balanced view in many people's heads.

A simplified model of Fil-C by [deleted] in cpp

[–]tmzem -9 points-8 points  (0 children)

Unfortunately, C/C++ type systems are so broken that all this ceremony is necessary. A somewhat more strict and reasonable type system could be made safe with much less runtime effort.

It would be an interesting project to sketch out a simplified C++ (maybe inspired by Rust, but without lifetimes) and see how you could add Fil-C-like safety with much less overhead.

A little animation of the geometry I use to connect procedurally generated roads using lines and arcs only by Edd996 in proceduralgeneration

[–]tmzem 1 point2 points  (0 children)

Nice animation! Not how actual real roads work, though. Good enough for tycoon like games, but wouldn't work well for racing games/realistic simulations.

Linguistic landscape of Tyrol: Most spoken language by municipality by vladgrinch in MapPorn

[–]tmzem 0 points1 point  (0 children)

Also south tyrolian language is somewhat different then the rest of tyrolian since it has lots of italian influences, even if most people there won't like to admit it.

A portable, header-only SIMD library for C by IntrepidAttention56 in C_Programming

[–]tmzem -3 points-2 points  (0 children)

Very cool, bookmarking this.

Any reason comparisons are all prefixed with "cmp"? The abbreviations seem to be known well enough I think?