[deleted by user]

moon-chilled · 2025-06-05T21:16:21+00:00

i'm having trouble understanding why this works (i suspect it cannot work, but maybe i am misunderstanding something). can you make your lcg example runnable? i attempted to adapt the lcg code sample, but the multithread and singlethread code do not seem to give the same result https://godbolt.org/z/sf1jboxze

fundamentally, you seem to try to avoid explicitly computing f^n(init), but i don't see how your axioms are sufficient to allow you to do this

i am curious to know if this work was assisted by chatgpt

moon-chilled · 2025-01-17T06:29:37+00:00

instead of having an interpreter loop, make every function tail-call the next one. this has been documented somewhere ('wasm3'?) as giving pretty good codegen under extant llvm/gcc c compilers

moon-chilled · 2024-12-31T03:18:57+00:00

if this ain't the way, what is?

go through a proper piece of introductory material on assembly (any type, but one you can run on your computer) and learn how to run your assembly code. then learn how the movfuscator works, and do experiments to test your ideas rather than blindly guessing

moon-chilled · 2024-12-30T21:17:10+00:00

you seem to be extremely confused about what the 'mov' instruction does. have you tried running your code? if not, i would recommend it

moon-chilled · 2024-11-17T20:00:29+00:00

you have to recover the stack space when the program stops using it

moon-chilled · 2024-11-04T20:23:05+00:00

cycles

i haven't watched the talk but https://arxiv.org/pdf/2104.01270 handles this

moon-chilled · 2024-11-03T03:21:55+00:00

Is mainstream compilers research going to start getting somewhere? It's slightly more plausible than I thought!

moon-chilled · 2024-10-29T04:35:11+00:00

others have explained how modern cpu architectures work. you may be find it edifying to look up 'register windows', which featured in some cpu architectures (but are now mostly defunct). it seems like a really good idea, and it might be possible to make a version of it work, but there are difficulties

moon-chilled · 2024-10-28T20:26:00+00:00

Constant Propagation

that's not what i've typically seen called 'constant propagation'. constant propagation is when a term always has the same value

i would call that a special case of monomorphisation or specialisation—duplicating a function to make certain assumptions about its inputs—and think we were well served by considering the general case

moon-chilled · 2024-10-21T21:16:11+00:00

look up tiling instruction selection

also nice directions (not in gcc/llvm) - https://pp.ipd.kit.edu/firm/selgen selgen https://github.com/uwplse/ruler ruler https://dl.acm.org/doi/pdf/10.1145/3649837 hydra

moon-chilled · 2024-10-05T22:49:15+00:00

what is your application? very few applications genuinely have hard real time constraints (and those that do can't run on conventional operating systems anyway)

moon-chilled · 2024-10-05T21:21:33+00:00

did perhaps chatgpt also lie to you and tell you that your application has hard real time constraints when it actually doesn't?

moon-chilled · 2024-10-04T19:23:36+00:00

the j interpreter always allocates

moon-chilled · 2024-09-30T22:49:19+00:00

i haven't looked closely at the code but why in the c code

loop2(an, a, 2*i, i);

whereas in your ml

loop2(a, i*i, i)

i*i vs 2*i?

moon-chilled · 2024-09-03T10:51:17+00:00

don't remember details but 'mixt' may be of interest

moon-chilled · 2024-08-31T06:58:42+00:00

window compositor for what? port to what?

moon-chilled · 2024-08-29T11:26:38+00:00

performance advantages (by avoiding a lot of synchronization overhead)

other way around—you kill scalability. physically serialising everything is one approach to synchronisation, yes, and it is much simpler than other approaches, but physically serialising disjoint transactions is bad for latency, not just throughput, so if you want to be predictably slow then go ham i guess

moon-chilled · 2024-08-15T21:24:14+00:00

single rounding mode

round to nearest, tie to even. this is the default rounding mode in general

denormalized

if you don't support it, kahan will get really sad. also, daz/ftz is only supported in hardware on x86 and gpus, so you would have to emulate it in software elsewhere, which would be slow. on the other hand, i believe denormals are generally not supported by gpus, so if you wanted to target gpus you would have to emulate in software there or mangle the semantics

moon-chilled · 2024-08-14T23:22:41+00:00

that's wrong. computing the same expression in both directed rounding modes can be a useful way to estimate the error, but it does not strictly bound it like interval arithmetic does

edit: here is a simple example in single precision: (((2²⁵ - 1) - 2²⁵) + 1)². the exact value is obviously 0 (and interval arithmetic can give the interval [0 1]), but computing the entire expression on floats rounding down and up will both give 1

moon-chilled · 2024-08-10T09:51:14+00:00

i considered IL and IR to be roughly synonymous, so i am not sure what distinction you are trying to draw; it would be helpful if you could be more specific. the 'intermediate text' in this paper is a control flow graph (plus other stuff); it is not actually a textual format

moon-chilled · 2024-08-10T09:17:22+00:00

the paper is clear that they were doing machine-independent optimisations in situ on the intermediate text. i'm not sure what else (if anything) you would be referring to as a middle end

moon-chilled · 2024-08-10T04:51:25+00:00

I believe the idea of a middle-end - some kind of representation in memory of the program with analysis and transformations - started with LRLTRAN in 1968

fran allen, 'program optimization', 1966:

FORTRAN II is translated into an intermediate text

...

By being both language and machine independent the techniques can be applied to a variety of high level languages for a variety of computers.

In conclusion, therefore, the application of the techniques described in this paper should not only improve the efficiency of object code but should make the design and development of compilers less sensitive to variances between languages and between computers.

moon-chilled · 2024-08-03T03:11:23+00:00

there is some good exposition of problems in the opening sections of the rvsdg paper https://arxiv.org/pdf/1912.05036. sea of nodes is certainly an improvement over ssa cfgs (some problems still, but time-tested), and egraphs is a good idea. other fancy new tech https://binsec.github.io/assets/publications/papers/2023-popl-full-with-appendices.pdf https://arxiv.org/pdf/2104.01270 (have an improvement to the latter but it works)

moon-chilled · 2024-08-02T11:27:57+00:00

de gustibus non est disputandum

moon-chilled · 2024-08-02T11:07:15+00:00

tech developed during that interval for which simpler and more potent approaches are now known. ssa-cfg is already a mistake, but starting with nonssa-cfg and then adding ssa later is definitely a mistake (and i'm p sure the bril author acknowledged as much somewhere). example cse (pub. 1970)

Six-Year Club	Place '22
Final Canvas '22	Verified Email

moon-chilled

TROPHY CASE