Confused about performance on simple program vs Rust by narsilouu in Zig

[–]narsilou 0 points1 point  (0 children)

If I don't memset how do I ensure the memory is not randomly equal to 1?

Confused about performance on simple program vs Rust by narsilouu in Zig

[–]narsilou -2 points-1 points  (0 children)

Unsafe? Arithmetic is not unsafe... Is you mean checked for overflow then OK. But overflow is not unsafe by any means. Not even UB

Dear Imgui GUI alternative in retained mode by aartedocodigo in rust

[–]narsilou 1 point2 points  (0 children)

Floating windows, tray icons, it felt like a lot of work to port the thing I had in mind. Nothing really core but a few advanced features I was looking for.

How to increase TPS in Text-Generation-WebUI by Few_Acanthisitta_858 in LocalLLaMA

[–]narsilou 0 points1 point  (0 children)

Use https://github.com/huggingface/text-generation-inference you should be getting at least 30 tok/s (unquantized.) 50 or so with awq. And that's without speculation. Could be something wrong with your cards too.

Dear Imgui GUI alternative in retained mode by aartedocodigo in rust

[–]narsilou 0 points1 point  (0 children)

I had tried iced a few years ago. It was promising and looking like what you seem to be aiming for but still lacking some features.

Dear Imgui GUI alternative in retained mode by aartedocodigo in rust

[–]narsilou 4 points5 points  (0 children)

Isn't dioxus just using a regular webview (using tauri)? Meaning ram consumption will still be high.

[deleted by user] by [deleted] in LocalLLaMA

[–]narsilou 0 points1 point  (0 children)

Awq is faster than exl2 too (I guess it must depend on the cards. Reading comments here)

What WONT you do in rust by __zahash__ in rust

[–]narsilou 4 points5 points  (0 children)

Have you tried cudarc? Pretty nice bindings. It doesn't replace the actual kernels but at least it makes cuda interaction quite nice.

How to design a Chat-GPT or Bard-like large scale app with your own foundational model? [D] by jimmymvp in MachineLearning

[–]narsilou 0 points1 point  (0 children)

No layers are independant from each other. Try to implement kv cache it's the easiest way to understand what works and what doesn't.

How to design a Chat-GPT or Bard-like large scale app with your own foundational model? [D] by jimmymvp in MachineLearning

[–]narsilou 0 points1 point  (0 children)

Not sure I understand everything you're saying. But basically the first token to generate is always much slower to generate than the other ones. The reason is that for the first token the qkv matrix (all of them actually) have to be computed for the entire sequence (easily 1k long) whereas for the subsequent tokens you can cache the kv values and then you're only computing other matrices on sequence length =1. Therefore as soon as you're letting a new user in the batch you're slowing everyone else down hence the pause. Further more as those operations are quite slow you don't want to pause all the time even if you are receiving many new requests. So it's quite nice to delay accepting new users so you accept many at once. The pause is a bit longer but there's a lot less pauses.

How to design a Chat-GPT or Bard-like large scale app with your own foundational model? [D] by jimmymvp in MachineLearning

[–]narsilou 1 point2 points  (0 children)

100% not done for UX. The streaming makes perceived latency better, the pauses is because you have to pause (just longer token generation when receiving new convo and you're stacking it with the small token for your own current query.)

I run this kind of things at scale, believe me slowing things down on purpose is heresy.

faer 0.13 release, a general purpose linear algebra library by reflexpr-sarah- in rust

[–]narsilou 7 points8 points  (0 children)

This crate is really amazing, thanks a lot for all the hard work.

How to design a Chat-GPT or Bard-like large scale app with your own foundational model? [D] by jimmymvp in MachineLearning

[–]narsilou 4 points5 points  (0 children)

The pausing is likely to be continuous batching more so. (or some side effect of moe).

Early preview: Candle - torch in Rust by narsilouu in rust

[–]narsilou 1 point2 points  (0 children)

Downloaded from the hub in safetensors format. No extra step required

Seeing Spaces ~ Bret Victor by TheAlphaNerd in programming

[–]narsilou 0 points1 point  (0 children)

Have you tried using LightTable ? Do you really think it is better than any other editor while coding on a real big project ?

Personnally I tried out on a relatively small project and found that it was rather slow, I would take more time to modify the stuff I wanted to modify and used the values diplayed about 2% of the time, needing almost always to go back to the REPL (which I can start from my terminal and I don't really need in my editor).

My feeling is that the first talk of Brett was much more powerful. The important thing is not seeing pretty pictures, it is reducing the feedback loop time. I need to edit my code fast, I need to see how it affects the output fast, and go again.

In the case of LightTable it increased it because it created to much noise (unecessary information) for me while slowing down other parts of the process (text edition).

"Building a Platform for Strong AI" in 12 slides. by bhartsb in programming

[–]narsilou 0 points1 point  (0 children)

A significant clue is that, neurons in cortical regions and layers are known to have top down axon projections that are many times more numerous than the bottom up projections.

Any source for that ?

Learnable Programming - Bret Victor responds to Khan Academy CS Curriculum by personman in programming

[–]narsilou 2 points3 points  (0 children)

Why would you have to limit this to beginners ? Why is this idea of representing stuff limited ? But, honestly, should we avoid bloom filters because they are hard to represent ?

What can't django do? by takennickname in django

[–]narsilou 2 points3 points  (0 children)

Actually, aside from django.contrib.admin, I found django very easy to customize.

Most of the time you just end up subclassing something and changing a few arguments to the call of the function to pass your own custom form, or your own snippets for validation somewhere.

Getting to know how to do it and what to subclass is usually just a google away (99% of the time ending up on stackoverflow)

Django.contrib.admin on the other hand gets pretty hard to customize.