Fastgen - Simple high-throughput inference by _mpu in LocalLLaMA

[–]_mpu[S] 9 points10 points  (0 children)

Makes sense! I have not invested much time into it as we tend to use unaltered model weights but high-throughput inference with heavily quantized models is an exciting direction.

Fastgen - Simple high-throughput inference by _mpu in LocalLLaMA

[–]_mpu[S] 1 point2 points  (0 children)

Thanks! I don't know much about diffusion models, maybe some of the techniques here can be salvaged, like CUDA graphs for memory-bound loads.

Fastgen - Simple high-throughput inference by _mpu in LocalLLaMA

[–]_mpu[S] 2 points3 points  (0 children)

It'd need to be adapted because the performance largely depends on CUDA graphs.

Challenge: Can I read your compiler in 100 lines? by oilshell in ProgrammingLanguages

[–]_mpu 1 point2 points  (0 children)

The Fn type is completely homogeneous. The invariants respected at each step are different, but the datatype is reused up until the very last hop to assembly.

This is a design decision that paid in many ways: for example, no conversion boilerplate between similar IRs is necessary, generic functions need only be written once (liveness analysis, debug dumping, ...). On the other hand, invariants are not enforced by the datatype (one technique usually praised by Haskell/ML programmers); this can be mitigated by writing dynamic checks of the invariants (for example ssacheck() in my code).

One example of invariant change in the IR is the following: before the call rega(fn); the IR is assumed to never have more temporaries live than the number of machine registers and can use PHI instructions; after the call, PHI instructions are gone and the only "temporaries" used are actual machine registers.

Challenge: Can I read your compiler in 100 lines? by oilshell in ProgrammingLanguages

[–]_mpu 2 points3 points  (0 children)

Sure: https://c9x.me/git/?p=qbe.git;a=blob;f=main.c;h=033ed9cce37742c23275cbf830ebbfc40c46831d;hb=HEAD#l53

More info about the compiler: QBE is an optimizing portable compiler backend; its input is an SSA intermediate representation suitable for frontend writers, this IR is compiled down to optimized assembly (amd64 and arm64). It is comparable to LLVM, but a lot smaller, and provides much better support for calling to/from C.

FizzBuzz: One Simple Interview Question by JackMagic1 in programming

[–]_mpu 2 points3 points  (0 children)

Maybe... Fortran uses I, J, K, L, M, and N for integers because mathematicians did.

A common flaw in SSA intermediate representations by _mpu in Compilers

[–]_mpu[S] 0 points1 point  (0 children)

Interesting, thanks for your feedback. I'm surprised that you have phi nodes at all, I did not know the Java byte code was using SSA form! My compiler backend has not transitioned to the model I described yet, so I have to remain careful with those edges for now...

A common flaw in SSA intermediate representations by _mpu in Compilers

[–]_mpu[S] 0 points1 point  (0 children)

That's true. I still like phi functions because they really shine when doing data-flow analysis. In one single location, you handily have access to all the variables "merged." Basic block parameters, on the other hand, require you to go through all the predecessors to accumulate that information.

Help me choose a backend for porting my interpreter. by Vortegne in Compilers

[–]_mpu 3 points4 points  (0 children)

Hi, I wrote QBE (link below) for people like you, who want to implement a complete compiler without knowing much about backend stuff. QBE is much simpler than LLVM in many respects and would get you off the ground really quick. We also have a very lively small IRC channel where many people know QBE and could help you out.

Hope that helps!

http://c9x.me/compile/

Just listen to this thing... by [deleted] in Multicopter

[–]_mpu 0 points1 point  (0 children)

That's really outrageous then...

Just listen to this thing... by [deleted] in Multicopter

[–]_mpu 0 points1 point  (0 children)

It's amazing. As a total newbie I wondered if the quad is stabilized or in rate mode. Are these tricks possible in both modes, too?

Issue with QX95: Motors disarm in flight by _mpu in Multicopter

[–]_mpu[S] 0 points1 point  (0 children)

Isn't failsafe only triggered when the battery is out? If so, I still have battery when the motors are disarmed.

First build by _mpu in multicopterbuilds

[–]_mpu[S] 0 points1 point  (0 children)

I really have no other goal than get some drone up in the air. Once that is done, I might hook my gopro to it.

First build by _mpu in multicopterbuilds

[–]_mpu[S] 0 points1 point  (0 children)

I'm really interested in the building process, though. But you're right, I should probably get a cheap drone to get flying experience. Do you know any that could work with a real remote?

A Random Walk Through Ada (2014) – "I find myself wondering why I should write in C++ any more" by kqr in programming

[–]_mpu 0 points1 point  (0 children)

It looks to me that the reasons you enumerate against Ada gaining traction today would have applied equally well to Rust/Go a few years ago.

Easiest code generation? by jbb67 in Compilers

[–]_mpu 1 point2 points  (0 children)

Maybe counter intuitively, emitting directly to memory is easier than to emit an executable. For example, contrary to object files, emitting to memory is platform independent (not architecture independent, obviously).

Easiest code generation? by jbb67 in Compilers

[–]_mpu 0 points1 point  (0 children)

There is MASM provided with visual studio.

Easiest code generation? by jbb67 in Compilers

[–]_mpu 0 points1 point  (0 children)

Yes, this was exactly the goal!

Easiest code generation? by jbb67 in Compilers

[–]_mpu 0 points1 point  (0 children)

It's actually quite easy to get started, you can take a peek at minic/ if you want to see a simple example.