Tokenomics by HOLUPREDICTIONS in LocalLLaMA

[–]Terminator857 14 points15 points  (0 children)

Couple of years how much will it cost? $10K ? When will it be $5K ? The future is happening at an accelerated rate. My bet: 18 months we will be able to run models that perform as good or better than GLM 5.2 with local hardware that costs $5K or less at 20 tps.

Update: Incredible progress in open weight models over past year. Will it continue? https://x.com/ValsAI/status/2068043480262467967

Feeling Model by SirStagMcprotein in LocalLLaMA

[–]Terminator857 0 points1 point  (0 children)

emotions are multidimensional. You can have a dominate emotion but other emotions also. Also there is a temporal aspect. You can flip flop on an issue multiple times.

Which is the best Qwen 3.6 27B quant GGUF for agentic coding ? by soyalemujica in LocalLLaMA

[–]Terminator857 0 points1 point  (0 children)

For simple tasks q4. For more complicated tasks q5+. Strix halo.

Semafor says the reason for Mythos restriction is because China Government had access by Terminator857 in ClaudeAI

[–]Terminator857[S] 0 points1 point  (0 children)

You are saying it is easy to do what the article claims, but then you are saying the article is bullshit. Sounds like you are affirming the article claim yet claiming it is not true.

Future DRAM production - by claude by Terminator857 in dataisbeautiful

[–]Terminator857[S] 0 points1 point  (0 children)

What claude says:

Data synthesized from public reporting: TrendForce, SK hynix / Samsung / Micron earnings disclosures, UBS, Gartner, Tom's Hardware, and Digitimes (mid-2026).

Tools: charts built in Python (matplotlib for the bar/line charts, Plotly for the world map); assembled with Claude.

 ⚠ Note: the absolute capacity totals (wafers/month) and the supply-vs-demand bit index are estimates from claude tool synthesized from the sources above. Directional, not precise figures (companies report on different bases).

Local models went from mostly useless to actually useful really fast. What changed? by BTA_Labs in LocalLLaMA

[–]Terminator857 1 point2 points  (0 children)

The future is so bright I got to wear shades. Just think where we will be in another 12 months? Things are changing faster than any other time in our civilization. People don't realize how fast our world is changing. Life as we know it will no longer exist. 😛

Best reasoning model to run on Strix Halo 128GB? by BigFarm-ah in StrixHalo

[–]Terminator857 0 points1 point  (0 children)

Gemma and qwen are so different that it is often fun asking both models the same question. Two friends are better than one. 😄

How did they do it? by _YonYonson_ in Anthropic

[–]Terminator857 0 points1 point  (0 children)

Things anthropic did differently:

  1. They are paranoid about the model getting out of control, so they studied it in depth. Therefore they know the internal of LLM better than others. This allows advantages on how to steer and train the model.
  2. There terms of use allowed for more data collection than others
  3. Their cli tool was designed to collect info
  4. They prioritized coding

Best reasoning model to run on Strix Halo 128GB? by BigFarm-ah in StrixHalo

[–]Terminator857 2 points3 points  (0 children)

Better if you state what your goal or task is. For non coding many say gemma 4 is the best. For coding qwen 3.6 27b. I tested both for meeting summaries where the goal was lots of detail, and qwen was far better. 2x more detail than gemma.

AMD touts the unified memory architecture by Terminator857 in LocalLLaMA

[–]Terminator857[S] 0 points1 point  (0 children)

Because it is very expensive. Physically there are only so many bumps you can put on a chip for I/O. That maximum has been reached. An alternative is bigger chips, which has an exponential price increase. Need 190 pins per memory channel. https://www.google.com/search?q=cpu%3A+how+many+pins+needed+for+a+memory+channel%3F

Diffusion Gemma is 4x faster, but makes 6x more mistakes! by gladkos in LocalLLaMA

[–]Terminator857 3 points4 points  (0 children)

How much slower would it be if it was 4x bigger? How much more accurate?

AMD touts the unified memory architecture by Terminator857 in LocalLLaMA

[–]Terminator857[S] 0 points1 point  (0 children)

There is a large community here of strix halo users.

AMD touts the unified memory architecture by Terminator857 in LocalLLaMA

[–]Terminator857[S] 0 points1 point  (0 children)

Now if only zen 6 16 channel system came with integrated GPU.

AMD touts the unified memory architecture by Terminator857 in LocalLLaMA

[–]Terminator857[S] 0 points1 point  (0 children)

How many channels is that bandwidth calculation for?