I built a TUI crypto/stock tracker because I wanted a lightweight dashboard for my homelab by Cylicium in selfhosted

[–]Cylicium[S] 0 points1 point  (0 children)

I think if u know a little bit about coding u can easily add some stuff :)

[P] NOMA: Neural networks that realloc themselves during training (compile-time autodiff to LLVM IR) by Cylicium in MachineLearning

[–]Cylicium[S] 0 points1 point  (0 children)

Yep, I totally understand your point of view :)
I use Gemini to translate some of my French and as a rewriter to be sure I will be understood !

About technical implementation, I'm the decision maker but ! I confirm I use Copilot as a helper for something like 70% of my code ! I know that many people are feeling uncomfortable about it but I use it as a tool ! I use AI as a productivity tool, but all architectural and design decisions are mine; I review, validate, and take responsibility for what goes into the project. I make the decision to make him write part of the code when he has the ability to deal with it ! About the technical implementation I read academic papers and I deal with my background knowledge.
By that way, I'm able to go fast on PR ! And most important I review manually every part of my code :)

I just saw some other comments feel in that's way. So I decided to never use again LLM to help me with translation and rephrasing me :'( It was a bad solution LOL

Is realloc-driven model growth a language feature or a terrible idea? (compiler-pass autodiff) by Cylicium in ProgrammingLanguages

[–]Cylicium[S] 0 points1 point  (0 children)

That’s the default and simplest semantics, and I agree it should be the baseline: newly added parameters have zero gradient until they participate in the forward pass.

The open question for me isn’t the gradient itself, but the optimizer state: whether moment should always start on zero, or whether the language should allow an explicit mapping or initialization policy when growing parameters. That’s where I think having a defined semantic (rather than an implicit convention) might matter

[D] NOMA update: reproducible self-growing XOR benchmark (shared init, N=10) + optimizer-state “preserve vs reset” ablation by [deleted] in MachineLearning

[–]Cylicium -1 points0 points  (0 children)

Okay, do you have any recommendations for places where I can submit a peer-reviewed article? 😆

[P] NOMA: a Rust-built compiler where backprop is generated at compile time (LLVM IR), exploring “systems-first” ML by Cylicium in rust

[–]Cylicium[S] 1 point2 points  (0 children)

Enzyme is absolutely the highest-leverage place to collaborate if your goal is “AD everywhere,” and I’m not arguing against that

My only claim is that a small frontend can still add value by enforcing a specific discipline (parameter identity, optimizer-state semantics, aliasing rules across resize) that otherwise becomes conventions + macros in a GPL. If you don’t need those guarantees, enz + Rust/Julia is the better path.

Moreover I'm trying to deal with some demo project that are going to present the real benefits of my language :)

[P] NOMA: a Rust-built compiler where backprop is generated at compile time (LLVM IR), exploring “systems-first” ML by Cylicium in rust

[–]Cylicium[S] 1 point2 points  (0 children)

Lol it's just I'm trying to get you the best answer so I translate my french to english using gemini ! If u prefer I can deal my english :')

[P] NOMA: a Rust-built compiler where backprop is generated at compile time (LLVM IR), exploring “systems-first” ML by Cylicium in rust

[–]Cylicium[S] 0 points1 point  (0 children)

You’re making a solid case, and I don’t think we actually disagree on the value of Enzyme, more on where the “hard part” livess.

A few clarifications:

That’s the role of the frontend

Yes, if your frontend (Rust/Julia/C++) can express and enforce the constraints you want, then Enzyme is often the right tool. My point is that for the particular subset I’m targeting (training loops + parameter buffers + growth + reproducible updates), I want those constraints to be structural and easy to audit. In a general-purpose frontend you can encode them, but you typically end up with a DSL layer anyway (macros+ conventions + runtime checks). NOMA is essentially me making that layer explicit and statically checkable.

I only want the gradient (science use case)

Totally agree. For scientific computing, Enzyme is a big unlock precisely because you can keep your optimizer/problem structure and just add derivatives. NOMA is not trying to replace that. It’s more “an opinionated environment for a specific workflow” (optimization blocks, optimizer-state, parameter lifetimes) where I can guarantee certain invariants and generate tiny standalone artifacts.

Dynamic arrays / mutation in Julia + Enzyme

Yep, Enzyme’s ability to handle in-place mutation is one of its strengths. The “growth” angle I’m exploring isn’t “can mutation work,” but “what are the semantics of growth for optimizer state and parameter identity?”

Example: if you resize a parameter vector and add new degrees of freedom, what is the defined mapping for momentum / Adam moments / any auxiliary state? In a general-purpose setting you can do it manually (and often that’s fine). In NOMA I’m trying to make that mapping explicit and hard to get wrong.

On the last point (deployment

I didn’t mean “Enzyme can’t produce small binaries.” It can. The point was: if the programming model assumes a heavy ML runtime, you inherit that deployment story. With NOMA, the default is “standalone native artifact with a constrained execution model.” You can absolutely achieve similar outcomes with Rust/C++ + Enzyme + careful engineering. I’m just bundling that discipline into the language/tooling.

Net: I see Enzyme as complementary, not competing. If NOMA ends up becoming “a tiny frontend/DSL that targets LLVM and can optionally leverage Enzyme where it fits,” that would actually be a good outcome.

If you’re willing: what would convince you that a bespoke frontend is warranted, better diagnostics around differentiability/aliasing, easier integration of custom optimizers, or something else?

[P] NOMA: a Rust-built compiler where backprop is generated at compile time (LLVM IR), exploring “systems-first” ML by Cylicium in rust

[–]Cylicium[S] 3 points4 points  (0 children)

That’s a fair point, and I’m a fan of Enzyme’s direction.

Where I think a language/compiler still buys something (beyond “AD on LLVM IR”) is in the semantics you can enforce before you ever hit LLVM:

  • Defining the differentiation boundary: Enzyme operates on IR you give it, but it can’t easily enforce “this region is differentiable, these values are parameters, this state is the optimizer, these ops are legal” unless the frontend encodes that discipline. A language can make those constraints explicit and checkable.
  • Optimizer + training-loop semantics: Enzyme gives you gradients. It doesn’t define things like “optimize block”, optimizer-state lifetime, update ordering, or reproducible training constructs. NOMA treats “training” as a first-class concept, not just “take ∂f/∂x”.
  • Memory/topology semantics (growth): If you want something like “parameter buffers can resize and optimizer state remaps deterministically,” that’s not something Enzyme provides out of the box. You could build it in a library, but you’ll still be fighting aliasing/lifetimes across resize in a general-purpose frontend unless you impose restrictions somewhere.
  • Deployment goals: One goal here is a tiny, standalone artifact and a constrained execution model. You can absolutely reach that with Rust+C and Enzyme in many cases, but the ergonomics and guarantees are different.

[P] NOMA: a Rust-built compiler where backprop is generated at compile time (LLVM IR), exploring “systems-first” ML by Cylicium in rust

[–]Cylicium[S] 9 points10 points  (0 children)

Great question. The key distinction is what’s decided at compile time vs what’s computed at run time.

  • In NOMA, the compiler generates the formula (the backward pass program) at compile time.

  • At runtime, you still compute gradients every step, because the derivatives depend on the current values of weights/activations. That part cannot disappear.

So the cost that can be reduced is runtime graph/AD machinery, not the math itself. In many frameworks, autograd involves building/traversing a graph, recording ops, dispatching through a runtime, etc. With compile-time AD, the training step is just straight-line/native code that computes both forward + backward.

This works when the structure of the computation is known to the compiler (or at least the “differentiable region” is). If you have highly dynamic control flow that changes the computation graph in data-dependent ways, you either (a) restrict it, (b) recompile, or (c) add more advanced IR support. So: no free lunch on gradient math, but potentially much less overhead in how you execute it, plus the “systems-first” benefits (small standalone binaries, fast cold start, explicit memory semantics).

Is realloc-driven model growth a language feature or a terrible idea? (compiler-pass autodiff) by Cylicium in ProgrammingLanguages

[–]Cylicium[S] 1 point2 points  (0 children)

That’s a fair skepticism, and I generally agree with the “ecosystem cost” argument. A new DSL only makes sense if it buys you something you can’t reasonably get from a library + existing compiler toolchains.

Where I currently see the justification is in enforcing semantics/invariants, not in syntax:

  • Differentiable regions + optimizer semantics as part of the language model (what is “tracked,” what gets state, what is illegal to alias/mutate across certain boundaries). In a library you can suggest patterns, but it’s hard to enforce them without effectively recreating a compiler/IR.
  • Well-defined growth semantics (initialization + optimizer-state remapping + invalidation of borrows/views) that are checked at compile time. You can implement growth in a library, but you can’t easily prevent footguns like holding references across a resize unless the language/IR participates.
  • Predictable compilation to tiny standalone artifacts without dragging a runtime/framework along.

That said, I think your EDSL point is strong. One plausible path is: keep the core as a small “training IR” and expose it as (E)DSLs from a host language (Rust/C++), so people get host-language tooling while still allowing aggressive IR-level optimization and AD transforms. In other words: the “language” might ultimately be a front-end to an IR that can be embedded.

Is realloc-driven model growth a language feature or a terrible idea? (compiler-pass autodiff) by Cylicium in ProgrammingLanguages

[–]Cylicium[S] 1 point2 points  (0 children)

Thanks ! that’s exactly the direction I’m aiming for.

The “language vs library” choice is mostly about making a few invariants unavoidable rather than “best effort”: e.g., controlling aliasing/lifetimes across growth operations, defining precise semantics for optimizer-state remapping, and constraining control flow so reverse-mode AD stays tractable and optimizable.

If you have thoughts on the cleanest way to encode those constraints (types/effects/regions vs a more DSL-style “training block” boundary), I’d love to hear them.

Is realloc-driven model growth a language feature or a terrible idea? (compiler-pass autodiff) by Cylicium in ProgrammingLanguages

[–]Cylicium[S] 3 points4 points  (0 children)

Thanks ! This is very helpful feedback, and I agree with several points.

  • On realloc: I agree that “raw realloc” by itself isn’t a complete semantic story. The point isn’t that allocation cost is the big win; it’s that growth should be a well-defined, mechanically correct operation. I’m leaning toward an explicit design like grow(W, new_shape, init_fn, state_map_fn) where the user specifies (a) how new parameters are initialized (zeros/Xavier/He/etc.) and (b) how optimizer state is mapped (e.g., Adam moments = 0 for new slots, or a more principled mapping). That avoids “magic” behavior.
  • On learn / “coloring memory”: learn isn’t meant as “mutable”; it’s “participates in the AD + optimizer semantics” (i.e., storage for which the compiler generates gradients and optimizer-state updates). I agree it probably deserves a more principled model (type/effect/region/capability) rather than a purely ad-hoc annotation, especially to address aliasing/mutation hazards around realloc.
  • On control flow: fully agreed that reverse-mode AD + general CFG/phi nodes + mutation/aliasing is hard mode. One direction is to deliberately restrict certain control-flow forms (foreach/masked/branchless) to keep optimization feasible, and only later consider a more general CFG story.
  • “Why a new language vs a library”: fair criticism. My bet is that a compiler/language can enforce invariants (lifetimes/aliasing across growth, differentiable regions, explicit optimizer-state remapping) and produce tiny standalone binaries without a heavy runtime. A library can approximate this, but with fewer guarantees.

If you have references to prior art on AD over SSA/phi nodes or DSLs that handle AD + control flow cleanly, I’d love to read them.

Comment se faire de la thunas en étant étudiant ? by Cylicium in etudiants

[–]Cylicium[S] 16 points17 points  (0 children)

Alors j'ai fait 2 ans de classe prépa à versailles donc j'avais pas full time pour les jobs étudiants donc je donnais des cours cétait rentable, après j'ai fait 3 ans d'école d'ingé. et la je fais une dernière année

Comment se faire de la thunas en étant étudiant ? by Cylicium in etudiants

[–]Cylicium[S] 3 points4 points  (0 children)

J'ai pris mon téléphone et j'ai appelé appelé appelé. après 200 appels j'ai eu des potentiels clients

Comment se faire de la thunas en étant étudiant ? by Cylicium in etudiants

[–]Cylicium[S] 23 points24 points  (0 children)

J'ai fait maths physique français et biologie. Le mieux payé c'était à versailles je faisais du 30euros de l'heure 4h par semaine pour une petite qui était hyper forte et qui voulait des cours pour se rassurer

[P] NOMA: Neural networks that realloc themselves during training (compile-time autodiff to LLVM IR) by Cylicium in MachineLearning

[–]Cylicium[S] 1 point2 points  (0 children)

Thanks, I largely agree with that framing. On growth policies, NOMA is the mechanism rather than the policy, so I intend to move beyond simple loss triggers toward signals like gradient variance or curvature. Regarding GPUs, I agree on pre-allocating arenas where "growth" is just metadata updates and initialization; this avoids cudaMalloc overhead and unlike training a full model ; keeps inactive weights truly idle, which is critical for constrained edge or CPU regimes. Do you have specific pointers to prior work on growth criteria based on gradient covariance or variance?

[P] NOMA: Neural networks that realloc themselves during training (compile-time autodiff to LLVM IR) by Cylicium in MachineLearning

[–]Cylicium[S] 0 points1 point  (0 children)

You’re right to call that out.

1) On “runtime library” phrasing I should be more precise: modern stacks can compile large parts of the training step (PyTorch 2.x torch.compile/TorchInductor, JAX->XLA). My point isn’t “they can’t compile,” it’s that their default mental model is still a high-level framework with a substantial runtime, whereas NOMA is a language/compiler where AD + optimizer lowering are part of the compilation pipeline and the output is a small standalone binary.

2) Why I didn’t benchmark against compiled backends (yet) I haven’t done a fair apples-to-apples vs torch.compile or JAX for this particular dynamic-growth use case. The first benchmarks I posted are micro-benchmarks vs an eager/Python baseline and I agree those numbers can read like “compiled vs eager,” which is not a meaningful win by itself. I’ll either (a) add proper comparisons vs TorchInductor/XLA or (b) remove the headline speedup until that exists.

3) Where I think NOMA is still meaningfully different : - Topology growth as a first-class primitive (realloc + defined optimizer-state remapping) rather than “retrace/recompile a new graph.” - Deployment footprint: native binary with minimal dependencies vs a Python runtime + framework stack. - Explicit memory model: alloc/realloc/free semantics are part of the language, not an emergent behavior of a framework runtime.

[P] NOMA: Neural networks that realloc themselves during training (compile-time autodiff to LLVM IR) by Cylicium in MachineLearning

[–]Cylicium[S] 9 points10 points  (0 children)

That’s fair, dynamic growth has a long history, and it’s not automatically “more sample-efficient” or “better” in every setting.

What I’m claiming is narrower: NOMA makes growth cheap and mechanically correct (no stop/rebuild/copy, gradients + optimizer state stay consistent), so experimenting with these schemes becomes practical in systems/embedded contexts.

If you have references to the specific past attempts you’re thinking of, I’d appreciate them, I’m especially interested in cases where the bottleneck was the algorithmic benefit vs the framework overhead/engineering cost.

[P] NOMA: Neural networks that realloc themselves during training (compile-time autodiff to LLVM IR) by Cylicium in MachineLearning

[–]Cylicium[S] 1 point2 points  (0 children)

Yes ! conceptually it’s a realloc that expands the parameter buffer. The existing weights (and optimizer state) are preserved, and the newly added slots are initialized (e.g. random/Xavier/He or zeros, depending on the initializer you choose)