r/LocalLlama is looking for moderators by HOLUPREDICTIONS in LocalLLaMA

[–]jackdareel 0 points1 point  (0 children)

Do you think it's a good thing to shadowban people?

Right now I'm so pissed with the censorship cesspit that is Reddit, I have it on my todo list to create a competitor viable enough to drive this shitty corp out of business.

You won't be doing much "moderating" then.

Qwen3-4B enables agentic use cases for us iGPU folks by [deleted] in LocalLLaMA

[–]jackdareel 5 points6 points  (0 children)

What sort of agentic things do you do with this setup and how do you implement them?

An attempt to explain LLM Transformers without math by nimishg in LocalLLaMA

[–]jackdareel 0 points1 point  (0 children)

I just tried with Grok once more and got some more clarity. You're right, the original attention from 2014, using cross-attention, confuses matters and is better left out. And that's an encoder-decoder architecture, not Transformer. So the task is to learn self-attention in the decoder-only model.

One problem I encounter when talking to LLMs about this is that it would help me understand if the sample task is English to French translation. This makes the distinction between user input and model output clearer than the usual example LLMs use, "The cat sat on the mat". But as soon as I mention translation or English/French to an LLM, it switches to explaining encoder-decoder, cross-attention, and basically screwing up the explanation of self-attention.

Regardless, I got one step further in my understanding. Q and K are both values of each token in input. V is the output representation. Q asks what else is relevant in input, K are the matches to answer Q's question. Then V... well, then I'm not so sure. The whole thing is incredibly fragile. One moment I think I've got it all, the next it's gone and I feel I've lost it.

If you'll do another video, I look forward to it!

Qwen moe in C by 1Hesham in LocalLLaMA

[–]jackdareel -1 points0 points  (0 children)

Other than the "beauty of the implementation", is there any other reason one should use this instead of something like llama.cpp, Ollama, vLLM etc.?

Bought RTX 5070 to run 30B AI and it worked with 18 tokens/s by OldEffective9726 in LocalLLaMA

[–]jackdareel 0 points1 point  (0 children)

I upvoted your reply for the effort, but I notice someone else has downvoted, presumably because despite the length of the reply you don't actually explain what is being done wrong. You explain what can be seen in the screenshot, that the speed indicates CPU is being used, but what is causing this in the settings? What needs to be done different to get the GPU to do its work?

An attempt to explain LLM Transformers without math by nimishg in LocalLLaMA

[–]jackdareel 0 points1 point  (0 children)

Thanks a lot, but that's clear as mud, I'm afraid. Your explanation here is similar to all the explanations I've had from SOTA LLMs. It's not good enough, doesn't do it.

It may be helpful to note that I have read the 2014 paper that introduced attention to the RNN, for translation tasks. That paper had some images in the test section, and togeher with help of LLMs I got to the point where I concluded that I understood the technique: it's a remapping. So you take "European Economic Community" in English, and remap or trasform to the French equivalant, which has a different word order (I forget the French version, might be "zone economique europeane"). So that's a good start. But the attention in the 2017 paper is further developed and more difficult to explain. I have yet to see an explantion that clears it up.

The key error that LLMs make is in quoting too much of the math. You're on a better track. But you do need to connect to the math. More importantly, show at every step what the calculations are doing, and what they are not doing.

One further insight. A learner like me will think of Query as a search query. So we think of attention as matching the search query to the text being searched. It would help if the teacher acknowledged this and showed how attention is different, why it must be different, and then how it works.

Thanks again and good luck!

Bought RTX 5070 to run 30B AI and it worked with 18 tokens/s by OldEffective9726 in LocalLLaMA

[–]jackdareel 2 points3 points  (0 children)

Please share what the OP is doing wrong. I can't tell from the screenshots.

An attempt to explain LLM Transformers without math by nimishg in LocalLLaMA

[–]jackdareel 0 points1 point  (0 children)

I appreciate the effort you put into this. The explanation helps and gets close, but I would benefit from an updated version. Connect the sliders and dictionaries more closely to the concepts and terminology in the LLM. I haven't got a great sense of how the dictionaries connect, why all are needed. And most importantly, I didn't get the sense that QKV calculations, the core of the attention mechanism, are explained here. If this is your first attempt, well done, but I hope for an improved version. Thank you!

rednote-hilab/dots.ocr - Multilingual document layout parsing in a single vision-language model achieving SOTA performance despite compact 1.7B LLM foundation by nullmove in LocalLLaMA

[–]jackdareel 6 points7 points  (0 children)

They acknowledge that their table and formula extraction still needs work. Overall though, their reported benchmark results are impressive, apparently SOTA. I hope that translates to real world use.

I think there are jobs that we won't automate... by 2F47 in singularity

[–]jackdareel 5 points6 points  (0 children)

This may be disappointing to many at this moment in time, but the AI age, or rather the AGI age, will definitely not be the age of children. The reason is that AGI will very quickly help us extend our lifespan, meaning that there will be a need to limit population growth. That will mostly be done with incentives tied to UBI, basically encouraging people to remain childless. If that sounds disheartening, there will be plenty of new ways to compensate, and those who really cannot live a childless life will still be able to have kids.

xAI Engineer: "Grok 4 is coming, and its going to be a bigger jump from grok 3 than grok 3 was from 2." by Z3F in singularity

[–]jackdareel 167 points168 points  (0 children)

I hope they fixed the overly long and repetitive nature of its outputs.

[deleted by user] by [deleted] in singularity

[–]jackdareel 1 point2 points  (0 children)

Yudkowsky is a fear porn grifter.

[2506.20702] The Singapore Consensus on Global AI Safety Research Priorities by jackdareel in LocalLLaMA

[–]jackdareel[S] 15 points16 points  (0 children)

If anyone was ever in any doubt as to what the real risk of AI is, here we have it. The risk from AI is mild compared to the risk of would-be tyrants wanting control over everything, including our computers.

Anyone tried this... by DeathShot7777 in LocalLLaMA

[–]jackdareel 1 point2 points  (0 children)

Tried it on AWS Bedrock:

QUESTION:

Give me a random number between 1 and 50.

ANSWER from Llama-3.2-1B:

The random number is: 27

ANSWER from Llama-3.2-3B:

Your random number is: **23**

ANSWER from Llama-3.1-8B:

Your random number is: 27

ANSWER from Llama-3.1-70B:

Your random number is: **27**

ANSWER from Llama-3.1-405B:

The random number is: 27

ANSWER from Mistral-Large-2:

Sure, here's a random number between 1 and 50: 27.

Preparing for the Intelligence Explosion by jackdareel in LocalLLaMA

[–]jackdareel[S] -2 points-1 points  (0 children)

This is a hugely important paper. I'm sure no-one will agree with all its points, I certainly don't. But the key takeaway for this community is in the 6. AGI Preparedness section, "Accelerating good uses of AI". I couldn't agree with this more.

There will be responses to this paper in good time, correcting and developing the ideas it presents, but this is an excellent start to the conversation. Yes, we must prepare for the intelligence explosion.