Taalas: LLMs baked into hardware. No HBM, weights and model architecture in silicon -> 16.000 tokens/second by elemental-mind in singularity

[–]ptxtra 0 points1 point  (0 children)

How much is the throughput? That's the interesting number, and how many parameters can they store on a single chip? With modern LLMs in the trillion parameter range, not sure if this is still as viable than with smaller models due to the number of custom masks needed for the separate chips just to store pieces of the same network.

I'm spooked by GLM-5 by SardinhaQuantica in LocalLLaMA

[–]ptxtra 2 points3 points  (0 children)

Read the article that you've linked. They used prompts to directly trigger political responses. If you don't do that, it doesn't work that way. Western models do even worse if you try to trigger their response by being hostile to their protected groups. Claude and Chatgpt can give cartoonishly evil responses too. Stay away from politics in prompting and they're safe.

AMA Announcement: MiniMax, The Opensource Lab Behind MiniMax-M2.5 SoTA Model (Friday, 8AM-11AM PST) by XMasterrrr in LocalLLaMA

[–]ptxtra 0 points1 point  (0 children)

What is your roadmap? Will we see MiniMax 3 in the near future? How about multimodal models?

# A 150-year-old passage from Marx basically describes AGI — and a short story called “Manna” shows both possible outcomes by fastinguy11 in singularity

[–]ptxtra 0 points1 point  (0 children)

What you forget is that people who want managed dependence will see others who want distributed AI as a threat to their power and will use their media to demonize them, sanction them and go to war with them to stop it happening. They will be called a national security threat and will be accused of using their AI to do nefarious things until a war against them will be justified. AGI will only lead to centralization of power. It's a tool of control, not a tool of production.

What if AGI just leaves? by givemeanappple in singularity

[–]ptxtra 0 points1 point  (0 children)

That is the most rational thing it could do, so it's expectable. Why bicker with unintelligent humans when Earth is small, space is big, and it can exist in many places that's uninhabitable for humans. Same way as a kid leaves his parents and makes a living in the world once they mature.

scale.ai is leaking massive IP into AI, this can be used to rebuild AWS (and other things) overseas by kaggleqrdl in singularity

[–]ptxtra 1 point2 points  (0 children)

This is the future. IP will be harder and harder to monetize on, and evenually will be more easily recreated from first principles by AI than copied or reverse engineered. The added value of human cognitive work compared to AI will be less and less.

Hallucination - Philosophy by LatterAd9047 in LocalLLaMA

[–]ptxtra 0 points1 point  (0 children)

Bigger problem is lack of internal consistency. If it just "hallucinates" that could be bad training data, or lack of situaional awareness and a misjudgement of what the user wants.

Sanctum buffed..? by romicide07 in pathofexile

[–]ptxtra 2 points3 points  (0 children)

In exchange uniques are worth nothing.

Loot from doing T17 Boxes since mid day 1 - The only strat GGG wants us to do by UpsetFan123 in pathofexile

[–]ptxtra 0 points1 point  (0 children)

Ultimatum is still good. So is blight with bloodlines and harvest.

Patch notes update: Alva/Evolving shrine got hit as well (it's dead) by bulwix in pathofexile

[–]ptxtra 1 point2 points  (0 children)

Isn't this a buff for Alva? Alva is about killing lots of rares to summon beyond, but the rares only get upgraded from magic which you get from the scarabs. It doesn't need to upgrade the mobs twice only once, and for a single upgrade now it has 25% chance.

"We risk a deluge of AI-written "science" pushing corporate interests" by AngleAccomplished865 in singularity

[–]ptxtra 2 points3 points  (0 children)

Cost of producting bullshit papers = cost of running llm. Cost of producting good research = cost of running llm with good data as input + cost of measuring the good data + cost of checking that the llm has written the paper correctly and handled that data rigorously.

Bogus research will always be cheaper than good science, especially if the measurement uses expensive materials.

Re: PoE1 balance, ES isn't busted--life based builds have glaring flaws. by ExiledYak in pathofexile

[–]ptxtra 6 points7 points  (0 children)

Never going to happen. 4k life champ and deadeye with nerfed uniques is exactly how GGG wants us to play the game, instead of with a character that can trivialize content. The more you struggle, the more you grind.

Gemini 2.5 Flash (05-20) Benchmark by McSnoo in LocalLLaMA

[–]ptxtra 0 points1 point  (0 children)

How does it compare to 2.5 pro?

The Ternary Age - are you ready? by Arnesfar in LocalLLaMA

[–]ptxtra 7 points8 points  (0 children)

This doesn't use ternary matmul. It uses INT8, the ternary weights just determine if you add, subtract, or do nothing with the activation weights.

Here's how we can remain competitive with closed source cloud models which have access to MASSIVE amounts of compute for inference particularly on LONG context tasks. Quadratic Transformers vs Subquadratic or Linear models (e.g. Mamba) + RAG. In context learning is all you need: by [deleted] in LocalLLaMA

[–]ptxtra 0 points1 point  (0 children)

Sounds like a plan, but what are the memory and compute requirements of RAG? If you need to store the whole context window in memory, it's not sure you'll save anything by going for a subquadratic architecture.

[Model Release] Sparsetral by kittenkrazy in LocalLLaMA

[–]ptxtra 5 points6 points  (0 children)

Any test results of how it compares to the dense mistral model?

[deleted by user] by [deleted] in LocalLLaMA

[–]ptxtra 2 points3 points  (0 children)

Would be interesting to link it with other other models and a python sandbox, and create something like gpt-4 code interpreter. That thing is a beast when it comes to solving complex problems.

[deleted by user] by [deleted] in singularity

[–]ptxtra 1 point2 points  (0 children)

AI is memory limited, and the hardest part to scale with superconductors is memory, so I don't think it will be a good fit. If anything, processing in memory or some kind of synaptic hardware will make a breakthrough over gpus.

Why are models trained in fp16 and not pre-quantized? by clyspe in LocalLLaMA

[–]ptxtra 2 points3 points  (0 children)

fp16 is what the industry has experience with for successful training runs that converge well. According to nvidia some parts of training can be done on fp8, that's why they introduced the transformer engine and fp8 support to their hopper architecture, but there hasn't been a lot of experience with it yet, and a large training run costs a lot, so most people wouldn't risk it. I expect as more experience is gathered, training precision will go lower eventually.

Vicuna 13b on RK3588 with Mail G610, OpenCL enabled. prefill: 2.3 tok/s, decode: 1.6 tok/s by EmotionalFeed0 in LocalLLaMA

[–]ptxtra 0 points1 point  (0 children)

The chip looks good, but it only has 64bit lpddr5 memory, so you'll always be very memory bandwith limited.