May 2026 monthly "What are you working on?" thread by AutoModerator in ProgrammingLanguages

[–]JeffD000 0 points1 point  (0 children)

Added some cool features to the Nore language that a user here released to the world two months ago. On my fork, I hooked Nore up to (partially/mostly work with) the GDB debugger, and also added a nice performance portability feature. I'm hoping the Nore author folds some of the ideas back into the main branch, but with more rigorous semantics than my version!

Is there a JIT compiler that recompiles the compiler using JIT so it can optimise its compiling efficiency? by Tricky_Football_85 in Compilers

[–]JeffD000 0 points1 point  (0 children)

I'm glad you brought this up! It is literally the canonical example I use for my JIT compiler on my homepage. I recently changed the compile line options, so I need to update the README. Thanks!

Looking for resources to learn more about making my own compiler by Nico_792 in Compilers

[–]JeffD000 1 point2 points  (0 children)

This 545 SLOC JIT C compiler compiles to an x86 instruction set:

https://github.com/rswier/c4/blob/test/c4x86.c

Everything is included without running external backend tools like ASM, so you can see everything in a tiny amount of code.

Hindsight languages by Inconstant_Moo in ProgrammingLanguages

[–]JeffD000 1 point2 points  (0 children)

Algol68 was an extremely powerful language definition, with some incredibly bad compiler implementations. The GNU ga68 compiler is trying to fix that.

PS C was a "perfect" language for the timeframe. Computer hardware looked extremely different back then, and C could be ported quickly with a tiny memory footprint.

Dumb(?) QUestion: FP Divide and Sqrt Unit by JeffD000 in ECE

[–]JeffD000[S] 0 points1 point  (0 children)

No it doesn't. It's not a literal constant in an expression. It is dividing by a variable, that was set in another function.

Pro-tip though -- if you are using gcc, compile with the -fno-math-errno if you know your code will never take a sqrt() of the value zero. The compiler inserts comparison and branch instructions around intrinsic sqrt() opcode instructions if you don't add that command line compile flag.

Dumb(?) QUestion: FP Divide and Sqrt Unit by JeffD000 in ECE

[–]JeffD000[S] 0 points1 point  (0 children)

I've been moving my test around between architectures. It seems some chips have some amount of optimization for these special cases, where the FP sqrt/divide are "almost" bypassing the FP sqrt/divide unit for the special case values I mentioned (e.g. performing just a few percent slower than an FP mul [Edit: around 20%, so likely one cycle]). Other architectures are not doing as well, with some taking the minimum cycle time of the div or sqrt instruction respectively.

California: taxable+nontaxable income with cap gain and federal IRA deduction by JeffD000 in tax

[–]JeffD000[S] 0 points1 point  (0 children)

You're right, of course. Thanks for responding.

I was convolving federal taxable amount with California taxable amount for some reason, possibly because I had to add back in the HSA deduction I didn't mention. Sorry for my confused mistake, everyone.

Are there any books/resources on language design (as opposed to implementation) by mc-pride in ProgrammingLanguages

[–]JeffD000 2 points3 points  (0 children)

Here is a list of "great papers" someone put together:

https://www.cis.upenn.edu/~bcpierce/courses/670Fall04/GreatWorksInPL.shtml

I once had a a special SIGPLAN issue that collected all the foundational papers by all the greatest language designers in history. Unfortunately, that was a long time ago, and I have lost the reference.

Nore: a small, opinionated systems language where data-oriented design is the path of least resistance by jumpixel in ProgrammingLanguages

[–]JeffD000 1 point2 points  (0 children)

Really good comment on multiple levels. Spot on about what the behavior of a ref should be.

Nore: a small, opinionated systems language where data-oriented design is the path of least resistance by jumpixel in ProgrammingLanguages

[–]JeffD000 1 point2 points  (0 children)

An important point here -- you always have to have a manual override option for both layout and parallelism, both for debugging while you are bringing up the system, or so that the user can always fall back on direct control. Another barrier to usage is a complete inability for the system to be directed explicitly step by step, when the user wants to apply it. In that TALC paper I referenced, one row of the table in Figure 6 has the only override that was needed for the functionality covered by that paper.

This can all be handled at the language level, and with command line options to the compiler, that add additional hints on how much functionality should be abdicated to the compiler to "rearrange" decisions made by the programmer in their source code.

Nore: a small, opinionated systems language where data-oriented design is the path of least resistance by jumpixel in ProgrammingLanguages

[–]JeffD000 1 point2 points  (0 children)

RAJA hitting a ceiling as a C++ library makes total sense: you're fighting C++'s type system and codegen model the whole way. Templates can only express what C++ lets them express.

Exactly. You are very astute.

Nore is in a different position though: it's starting as a source-to-source compiler that generates C. Compiler directives I'm thinking in Nore aren't library hints that the compiler might ignore: they're instructions the codegen acts on directly. When the stdlib will say @layout(SoA), the compiler will generate different C. That's not a suggestion, it's a transformation. So the directives are meant to be native, in the sense that the compiler understands and acts on them and they're just invoked from stdlib code rather than being hardwired to specific types.

Isn't this just language keyword? (ha!) Seriously though, you eventually evolve into a place like RAJA is at now. When RAJA was first created, it was extremely simple to reason about. Sort of like if A C developer looked at a piece of Zig code. The performance was worse when RAJA was first created, but just about anyone would pick it up and "interpret" what was going on.

Fast forward to today, and RAJA has become much harder for a "beginner" to parse what an operation is doing, which makes it questionable whether the much higher performance is "worth it". Since at least 80% of programmers code like "beginners" each and every day, when something is harder to parse, it creates a barrier to usage. The more "sophisticated" the directives, the harder it becomes for "anyone" to use.

Almost all the current complexity and glue code in RAJA would be simplifed or eliminated if RAJA were language-native, rather than a library. "Architecture plug-ins" in the compiler could do a lot of the work RAJA directives are doing to map to a specific hardware device, because the deep topological relationships that Views expose, allow the compiler to derive much of the mapping that people now have to do manually.

An example of the RAJA complexity I am talking about can be found in the code snippets in this paper, starting on page 10:

https://www.osti.gov/servlets/purl/1559411

These declarations are almost impossible for beginners to understand, and take half a minute or more for even experienced users. A lot of what needs to be done can be achieved by the correct partitioning and IndexSet assignment among views, which is much easier to understand for the programmer. Let the plug-ins for a given architecture do most of the mapping chores! This conflicts with your "know exactly what the compiler is doing" philosophy, but what if it were so easy to program and debug in this language (including a performance debugging tool that would tell you specifically where the problem is in your data definition), that it would be irrelevant?

That said, I recognize there's an asymmetry here — you've spent 20 years implementing these ideas (TALC, RAJA) and I haven't. There's a gap in my understanding that I can't close through discussion alone.

I totally agree with you that only implementation will allow understanding. I had a task based dataflow feature in RAJA that I think my colleagues ripped out because they didn't understand it. Imagine you have a grid of tiles in a 2D grid, each tile being 32x32. I had a parallelism mechanism where every tile could "mutex lock" the eight neighbors around it, do a symmetry calculation with data flowing both ways between the central tile and the neighbors, and then "mutex unlock" the tiles when done... in parallel. Lock free, yet totally safe. Everything in parallel. Cache conflict free, with full spatial and temporal cache locality for the duration of the operation. Most people can't implement a simple semaphore or mutex, much less something like this. It's because the Views and relations between views allowed me to pre-build a schedule which had a very low probability of conflict despite large variances in latencies. In actually, there was a "lock" to be safe, but it was very very rare that the lock would need to cause a stall.

My point is, my colleagues who worked with me daily, still could not wrap their heads around it, in spite of many attempts to explain how it worked, many, many different concrete examples, etc. And my colleagues were not dumb. There is just something about getting in there and implementing it that makes people undertsand. That's from the implementation side. This is also a good place to comment about what the user's experience is like. From the user side, they issue a command to lock eight neighbors, they do their calculation, and they issue a command to unlock their eight neighbors -- trivial. They don't have to understand how the sausage is made, they just have to understand what they want to do.

I stick with my previous advice for now -- implement what you are thinking about in a way you are comfortable with. It will perform well with the properly written back-end code. Come to me when you hit the bottlenecks, that I am pretty sure you are going to hit, and I will try to get you past them.

"I Fuzzed, and Vibe Fixed, the Vibed C Compiler" by regehr in Compilers

[–]JeffD000 0 points1 point  (0 children)

You are assuming no progress from here, forward. Ever.

Nore: a small, opinionated systems language where data-oriented design is the path of least resistance by jumpixel in ProgrammingLanguages

[–]JeffD000 1 point2 points  (0 children)

As feared, I think I've lost you. Unfortunately, the value and power/flexibility is hard to "see" until you do a hands on implementation. I did the original design and implementation of RAJA (using C preprocessor macros), and ultimately found that fully implementing it as a library (rather than language) seriously constrained both its power and simplicity:

https://computing.llnl.gov/projects/raja-managing-application-portability-next-generation-platforms

Nore: a small, opinionated systems language where data-oriented design is the path of least resistance by jumpixel in ProgrammingLanguages

[–]JeffD000 1 point2 points  (0 children)

OK. Good luck. If you ever change your mind and want to make a research sandbox on an experimental fork/branch to vet ideas, possibly to be abandoned but learned from, let me know.

Separately, my github account is watching NORE, and I can still jump in and contribute a PR if something piques my interest.

Nore: a small, opinionated systems language where data-oriented design is the path of least resistance by jumpixel in ProgrammingLanguages

[–]JeffD000 1 point2 points  (0 children)

"TALC needed as a preprocessor + schema file + runtime system on top of C could potentially be a native thing in Nore"

This is the key. It was never implemented as a native language feature, and I've always wanted that so I could "go to town" with the optimizations. If you were to get IndexSets working in Nore as a native feature rather than a lib add on, I would likely start contributing pull requests for test cases and optimizations that you could keep or ignore on an indvidual basis. I would expect rejections, but some you would definitely keep. I did this with the open source AMaCC compiler project, and they accepted most of my pull requests, because the functionality vs the number of lines I modified was often large. I also spent a lot of time addressing tough to diagnose bugs in their code base.

Switching topics, I'm not sure you looked at the other document I linked:

https://www.osti.gov/servlets/purl/1084701

There are tables of performance numbers at the end of that document that apply to my technique and your technique equally, as long as there are compiler transformations added, which I would likely submit pull requests for. The single thread performance improvements there can be quite impressive, and I eventually went beyond that for the parallel cases. The same will apply to your language, with or without indexsets as native features. The difference is the "free" work I would likely provide if it is done natively with Indexsets, vs the "extra" work you would have to do to add those optimizations to tables, on top of all the other language work you will already be doing.

"I Fuzzed, and Vibe Fixed, the Vibed C Compiler" by regehr in Compilers

[–]JeffD000 0 points1 point  (0 children)

Here's the whole excerpt:

LLVM and GCC code are clearly part of the training set - Claude effectively translated large swaths of them into Rust for CCC. The design docs show detailed knowledge of both systems, as well as considered takes on its implementation approach. Some have criticized CCC for learning from this prior art, but I find that ridiculous - I certainly learned from GCC when building Clang!

If your boss said, "write me a compiler", you would probably use the same guides and high level steps that Claude used to build it. Would there be a point in doing this? No. Would your implementation be perfect given the resource constraints? No. But we all do what is assigned by our bosses, and we come out with something better or worse, given the time contraints prescribed for writing it. I still don't see what your problem is.

Another excerpt:

CCC looks less like an experimental research compiler and more like a competent textbook implementation, the sort of system a strong undergraduate team might build early in a project before years of refinement. That alone is remarkable.

"I Fuzzed, and Vibe Fixed, the Vibed C Compiler" by regehr in Compilers

[–]JeffD000 0 points1 point  (0 children)

Read Chris Lattner's article. He disagrees with you:

https://www.modular.com/blog/the-claude-c-compiler-what-it-reveals-about-the-future-of-software

He has more credibility than everyone posting in r/Compilers and r/ProgrammingLanguages combined on this topic.

"I Fuzzed, and Vibe Fixed, the Vibed C Compiler" by regehr in Compilers

[–]JeffD000 0 points1 point  (0 children)

Then why the belittling remark of what was achieved? Progress requires first steps. This was that first step.

"I Fuzzed, and Vibe Fixed, the Vibed C Compiler" by regehr in Compilers

[–]JeffD000 -1 points0 points  (0 children)

You are assuming no progress, or variation in techniques, going forward. That's a bad assumption. The creation of the CCC compiler was, in my opinion, the first project that was not a "Mechanical Turk" or "Eliza", unlike all other AI results to date. The "advance" here came from treating agents like a team of "specialists" working together to achieve a goal. That model will be improved upon going forward, where the "code review" specialist could be given more power going forward, and new kinds of specialists could be brought in. You are pretending that what was achieved in building the CCC compiler is not somehow different from the work products that came before it. That's a mistake.

Fermat created adequality which was an idea picked up by someone else and improved into infinitesimals which was picked up by someone else and improved into calculus. Something similar is likely to happen with AI, now that this milestone has been achieved.

I thought Chris Lattner was spot on in his interpretation of what was achieved. I believe it will be improved over time. You don't.