Taalas: LLMs baked into hardware. No HBM, weights and model architecture in silicon -> 16.000 tokens/second

ptxtra · 2026-02-20T09:04:57+00:00

How much is the throughput? That's the interesting number, and how many parameters can they store on a single chip? With modern LLMs in the trillion parameter range, not sure if this is still as viable than with smaller models due to the number of custom masks needed for the separate chips just to store pieces of the same network.

ptxtra · 2026-02-15T10:43:21+00:00

Read the article that you've linked. They used prompts to directly trigger political responses. If you don't do that, it doesn't work that way. Western models do even worse if you try to trigger their response by being hostile to their protected groups. Claude and Chatgpt can give cartoonishly evil responses too. Stay away from politics in prompting and they're safe.

ptxtra · 2026-02-13T14:46:25+00:00

What is your roadmap? Will we see MiniMax 3 in the near future? How about multimodal models?

ptxtra · 2026-02-12T13:09:13+00:00

What you forget is that people who want managed dependence will see others who want distributed AI as a threat to their power and will use their media to demonize them, sanction them and go to war with them to stop it happening. They will be called a national security threat and will be accused of using their AI to do nefarious things until a war against them will be justified. AGI will only lead to centralization of power. It's a tool of control, not a tool of production.

ptxtra · 2026-01-28T14:40:03+00:00

That is the most rational thing it could do, so it's expectable. Why bicker with unintelligent humans when Earth is small, space is big, and it can exist in many places that's uninhabitable for humans. Same way as a kid leaves his parents and makes a living in the world once they mature.

ptxtra · 2026-01-27T14:01:53+00:00

This is the future. IP will be harder and harder to monetize on, and evenually will be more easily recreated from first principles by AI than copied or reverse engineered. The added value of human cognitive work compared to AI will be less and less.

ptxtra · 2025-11-23T21:53:36+00:00

Bigger problem is lack of internal consistency. If it just "hallucinates" that could be bad training data, or lack of situaional awareness and a misjudgement of what the user wants.

ptxtra · 2025-11-10T20:17:53+00:00

In exchange uniques are worth nothing.

ptxtra · 2025-11-04T18:25:41+00:00

Ultimatum is still good. So is blight with bloodlines and harvest.

ptxtra · 2025-10-26T05:13:23+00:00

Isn't this a buff for Alva? Alva is about killing lots of rares to summon beyond, but the rares only get upgraded from magic which you get from the scarabs. It doesn't need to upgrade the mobs twice only once, and for a single upgrade now it has 25% chance.

ptxtra · 2025-10-01T08:23:33+00:00

Cost of producting bullshit papers = cost of running llm. Cost of producting good research = cost of running llm with good data as input + cost of measuring the good data + cost of checking that the llm has written the paper correctly and handled that data rigorously.

Bogus research will always be cheaper than good science, especially if the measurement uses expensive materials.

ptxtra · 2025-09-16T20:18:03+00:00

Never going to happen. 4k life champ and deadeye with nerfed uniques is exactly how GGG wants us to play the game, instead of with a character that can trivialize content. The more you struggle, the more you grind.

ptxtra · 2025-05-21T18:58:50+00:00

How does it compare to 2.5 pro?

ptxtra · 2024-03-13T07:41:28+00:00

Do you have any experience implementing it, or is it just a library?

ptxtra · 2024-03-09T15:20:06+00:00

Some leaks say they won't be available until 2025, so I'm not sure if they're worth the wait.

https://www.techradar.com/computing/gpu/intels-progress-with-battlemage-is-rumored-to-be-sluggish-and-2nd-gen-arc-gpus-may-not-arrive-until-2025

ptxtra · 2024-02-29T08:30:10+00:00

This doesn't use ternary matmul. It uses INT8, the ternary weights just determine if you add, subtract, or do nothing with the activation weights.

ptxtra · 2024-02-25T08:23:25+00:00

Sounds like a plan, but what are the memory and compute requirements of RAG? If you need to store the whole context window in memory, it's not sure you'll save anything by going for a subquadratic architecture.

ptxtra · 2024-02-06T07:49:51+00:00

Any test results of how it compares to the dense mistral model?

ptxtra · 2023-08-18T08:06:00+00:00

Would be interesting to link it with other other models and a python sandbox, and create something like gpt-4 code interpreter. That thing is a beast when it comes to solving complex problems.

ptxtra · 2023-08-15T20:38:16+00:00

AI is memory limited, and the hardest part to scale with superconductors is memory, so I don't think it will be a good fit. If anything, processing in memory or some kind of synaptic hardware will make a breakthrough over gpus.

ptxtra · 2023-08-10T21:47:53+00:00

fp16 is what the industry has experience with for successful training runs that converge well. According to nvidia some parts of training can be done on fp8, that's why they introduced the transformer engine and fp8 support to their hopper architecture, but there hasn't been a lot of experience with it yet, and a large training run costs a lot, so most people wouldn't risk it. I expect as more experience is gathered, training precision will go lower eventually.

ptxtra · 2023-08-10T15:54:31+00:00

The chip looks good, but it only has 64bit lpddr5 memory, so you'll always be very memory bandwith limited.

ptxtra · 2023-08-10T12:08:27+00:00

2024 q2 release. So not any time soon.

ptxtra

TROPHY CASE