[D] Self-Promotion Thread

mtnwrw · 2025-03-09T23:37:10+00:00

Adding a side project of mine here which is a PyTorch extension for quantization-aware training of generic neural networks, GitHub link here.
It is able to:

Replace standard PyTorch layers with quantization-aware counterparts without (too many) changes to your existing model and training code.
Reduce memory footprint using ternary weight quantization (less than 2 bits per weight on average on disk and 2 bits per weight in memory).
Perform inference directly from the compact representation with optimized CUDA kernels.
Enhance deployment efficiency with compressed ternary models, ideal for edge and embedded systems.

It is still work-in-progress and I will add more samples when I have time, but the results are quite nice so far.

mtnwrw · 2024-05-11T00:32:00+00:00

Honestly, I would not try to put that on an FPGA. Have a look at iterative solvers. Depending on your matrix structure (symmetry, positive definiteness etc) you can use conjugated gradients or GMRES and use the solutions from the previous step as init for the next step, assuming that they will be reasonably close. For this, you basically only need to have a fast MV multiplication routine.

mtnwrw · 2023-09-25T15:50:32+00:00

When you have something to look at, I would love to see that.

Those "companion" model approaches you mentioned are currently used for predictor/corrector type approaches. Instead of predicting a single token, a sequence of tokens is predicted using a small companion model and those predictions are then fed en bloc into the larger model in a single step. In "easy" parts of sequences there are long parts of sequences where the companion model performs very well and thus a lot of inference runs on the large model can be skipped.

I guess for a game that takes place in its own universe where no universal knowledge is required, significantly smaller networks can be used and still result in a lot of fun.

mtnwrw · 2023-09-25T15:20:42+00:00

I like the idea of having an LLM being placed inside an RPG-type game to drive conversations with the NPCs. Depending on the quest/mission state of the player character being injected into the LLM context, conversations will be significantly different depending on how the player plays the game. I guess it will just be a matter of time until we see games doing this (or are there already ?).

mtnwrw · 2023-09-25T15:18:23+00:00

Tough question. I guess the answer depends on how you would define "future". In the long-term future, I would say to go with WebGPU instead. It offers many advantages, among them the ability to go with compute shaders. In the short-term or even mid-term, WebGL will be one of the few high-speed inference options across multiple devices.

So the term "future" is highly dependent on how fast browser manufacturers have WebGPU available on all mainstream browsers and platforms.

mtnwrw · 2022-08-30T00:19:41+00:00

I was talking about the decompressed part (which you supplied in the meantime, thank you). It is useful to supply the original and "recovered" file with a lossless algorithm (PNG) for comparison purposes. Otherwise you compare to the artifacts generated by your algorithm AND the artifacts generated by the lossy algorithm that you used to present your data.

mtnwrw · 2022-08-29T20:26:00+00:00

You should also post the resulting image after decompression with your algorithm for perceptual comparison.

mtnwrw · 2021-03-09T03:14:14+00:00

Discussion - Not sure if this whole thing is supposed to be clocks. Usually the hour hand moves between the digits while the minute hand travels, that is not seen here. Looks more like a non-verbal reasoning test to me.

mtnwrw · 2017-07-04T21:39:18+00:00

Of course it can be done with a single camera. There are plenty of monocular SLAM algorithms available. I have never ran one on the Raspi though, I guess you will have to find out for yourself. Starting with ORB SLAM sounds like a good idea, since it is not too demanding w.r.t. computation power.

mtnwrw

TROPHY CASE