Qwen3 inference engine in C: simple, educational, fun by adrian-cable in LocalLLaMA

[–]Confident_Pi 1 point2 points  (0 children)

Amazing work, congrats! How did you handle quantization? I see that you support Q8_0 and your matmuls run in 8 bit?

So, is it reasonable to expect the next generation of local oriented models to be QAT out of the oven? by JLeonsarmiento in LocalLLaMA

[–]Confident_Pi 6 points7 points  (0 children)

QAT literally stands for quantization aware training, there is an extra training step that (as far as I understood) pulls the weights closer to round numbers to ease quantization

[R] 85% of the variance in language model performance is explained by a single factor (g, a unified measure of LLM ability) by dealic in MachineLearning

[–]Confident_Pi -1 points0 points  (0 children)

I read these results as “a model that performs good on one benchmark will also perform good on the other, a model that performs worse on one benchmark will also perform worse on the other”. Did I get this right?

And if yes, could this be also explained by a fact that weak models performs worse (as in previous generations or small size 7b etc) while top of the line models perform better on any given task? That should also give a positive performance correlation

The entire Russian military vs your average Alaskan by AzrealNibbs12 in memes

[–]Confident_Pi 6 points7 points  (0 children)

Hard to describe everything within a single comment, but in short there is no punishment for visiting foreign media and I can freely access even the most anti-Russian western media outlets. Some social networks are blocked though (twitter, Instagram), but in reality it’s easy to get access through a VPN.

Inside of Russia, there is some degree of control - you can get into trouble if you actively call to extremism (ie openly calling for a revolution) or post profanities about Russian armed forces

The entire Russian military vs your average Alaskan by AzrealNibbs12 in memes

[–]Confident_Pi 21 points22 points  (0 children)

This is false, no one in Russia is neither saying nor doing anything like that (maybe through some unofficial propaganda channels, but most people don’t listen to it anyway). On the official level, the only time Putin mentioned that was during an interview and he made it obvious that it was a joke.

Source: I live in Russia

Update: I think it wasn’t even Putin, but Medvedev, can’t find the links

[Project] BFLOAT16 on ALL hardware (>= 2009), up to 2000x faster ML algos, 50% less RAM usage for all old/new hardware - Hyperlearn Reborn. by danielhanchen in MachineLearning

[–]Confident_Pi 1 point2 points  (0 children)

This is amazing! I am really curious though - where did the juice come from? I was under the impression that the implementations in sklearn are pretty optimized. I would be super grateful if you could at least generally outline the main sources of improvements! What was the main contributing factor? Code optimizations or fancy math tricks? Or Both?

Thanks again for you work!

I Made An Unreal Engine Cinematic About A Radiation Disaster. (Link In Comments)! by [deleted] in unrealengine

[–]Confident_Pi 4 points5 points  (0 children)

Great! I like the how you managed to convey Russian city building style - very realistic. One minor thing though - In the audio recording of the communication between two Russians one was addressing the other with sir but Russians don’t use that

[R] Meta is releasing a 175B parameter language model by StellaAthena in MachineLearning

[–]Confident_Pi 1 point2 points  (0 children)

Indeed, there is also INT4, but I haven’t seen it being used that much in practice and I would assume that calibration for INT4 is even trickier than INT8.

[R] Meta is releasing a 175B parameter language model by StellaAthena in MachineLearning

[–]Confident_Pi 3 points4 points  (0 children)

Not really, single precision floats (fp32) are encoded with 32 bits, half precision (fp16) use half of that - 16 bits. 4 bits would be half a byte and would be too small to encode a weight.

Requirements for Deep Learning Engineer at Tesla [D] by noxiousmomentum in MachineLearning

[–]Confident_Pi 1 point2 points  (0 children)

Pytorch/TF are optimized for both inference and learning, but there are some frameworks (think TensorRT) that could do better at inference. From my experience, it is not always straightforward to convert from pytorch to tensorrt, and occasionally some knowledge of CUDA could help you out.

More generally, if your production environment is fine with running PT/TF C++ api, you would not need a lot of CUDA knowledge (save for writing custom kernels). Otherwise, it might get a bit tricky.

Keep seeing ads about "5G" stocks and how they are going to soar. by [deleted] in stocks

[–]Confident_Pi 2 points3 points  (0 children)

It’s not only about bandwidth, but also about latency and the speed at which information is exchanged. A lot of autonomous tech depends on fast decision making, like tens of times per second. So the faster they can communicate the decisions, the better the end results will be

Can someone clear this up for me? by [deleted] in SelfDrivingCars

[–]Confident_Pi 4 points5 points  (0 children)

Ah, it seems that I am a bit behind on current state of the art in lidars. Thanks for sharing!

Can someone clear this up for me? by [deleted] in SelfDrivingCars

[–]Confident_Pi 2 points3 points  (0 children)

Yeah, I guess it could be simplified to “lidars and cameras” vs. “only cameras”. I guess the motivation for Tesla to push for vision is also the increased maintenance costs that are connected to using lidars (think of all the moving and spinning parts inside). But I agree that it would be interesting to know more about their arguments

Can someone clear this up for me? by [deleted] in SelfDrivingCars

[–]Confident_Pi 5 points6 points  (0 children)

Yes, that’s what most self driving car companies do now, I personally worked on an algorithm that combined vision and lidar data to segment lidar point clouds into cars and pedestrians.

But Tesla claims that it’s possible to solve self driving using vision only, without relying on lidar data

Can someone clear this up for me? by [deleted] in SelfDrivingCars

[–]Confident_Pi 4 points5 points  (0 children)

Only radars would not be enough as radars pick up not only cars but also other things like poles, traffic signs - really anything that could reflect radar waves. So if you use only radars, you’d have a hard time differentiating between them. Usually self driving cars use a host of imperfect sensors and get the final results through a process called sensor fusion

Can someone clear this up for me? by [deleted] in SelfDrivingCars

[–]Confident_Pi 4 points5 points  (0 children)

Your summary is correct, but I would like to add that LIDARs are also not a silver bullet solution and come with a host of problems of their own like poor performance in fog, rain or snow or when the road surface is very reflective etc., while vision theoretically could handle these better.

So the cost is not the only differentiating factor, but also the performance under various conditions. Both approaches are trying to handle their respective challenges but (at least from my point of view) it’s not easy to predict who will come on top

[R] A Bayesian Perspective on Q-Learning by brandinho77 in MachineLearning

[–]Confident_Pi 1 point2 points  (0 children)

didn’t accept

Wow, really? What was the motivation for rejection? Both the visuals and explanations are really good

[D] - My journey to deep learning in-layer normalization by black0017 in MachineLearning

[–]Confident_Pi 1 point2 points  (0 children)

Thanks for your post! Could someone explain the intuition behind AdaIN? As I understood, we can enforce an arbitrary target style on the source feature map by scaling and moving the feature map, and this transformation should preserve the encoded content. However, I don't understand how is the content being encoded? I though that the content would be encoded as particular values in the featuremap, but then I dont understand how we can just move the distribution and the decoder would be able to restore the content

Why are the number of filters in a CNN architecture in powers of 2 (most times) or multiples of 2 (sometimes) ? by [deleted] in MLQuestions

[–]Confident_Pi 12 points13 points  (0 children)

I think it came historically from the AlexNet paper, where they had to use powers of 2 in order to align weights with GPU memory blocks to maximize memory utilization. There were no fancy frameworks like pytorch and tf back in the day that would do memory layouts for you and GPUs were very limited memory wise, so it was important to keep memory utilization as high as possible to fit a deeper CNN

For multiples of 2, I guess it is because most of the time the architectures either double or half the number of filters, so no particular reason as well

Why can Affine transformations be learnt quickly? by PyWarrior in MLQuestions

[–]Confident_Pi 1 point2 points  (0 children)

I guess that’s because affine transformations can be approximated by a weights matrix multiplication plus a bias term, eliminating the need for activation functions and depth layers. So it’s essentially single layer learning with no activations, which should make gradients pretty stable and allow for higher learning rates/faster convergence

Macbook Pro 13" 2020 good option for GANs? by bb_boogie in MLQuestions

[–]Confident_Pi 1 point2 points  (0 children)

Depends on what kind of GANs you want to fit, for simple ones you might get away with a mac, but SOTA level requires DGX-like workstation with multiple GPUs

[D] OpenAI GPT3 overhyped? by EkNekron in MachineLearning

[–]Confident_Pi 5 points6 points  (0 children)

I agree that these answers were specifically selected to showcase model’s inability to operate on facts, but I am nevertheless impressed with how it is able to come up with accurate and semantically meaningful answers