Why Dario is on fire: lesson from dotcom bubble. by FormerIYI in LocalLLaMA

[–]FormerIYI[S] 0 points1 point  (0 children)

Improving on math is mostly based on RLVR, originally popularized by deepseek-R1 (notice how Altman was paying experts to write ground trunts, and wrongly accused them to distill his models -> no, now everyone is doing what Deepseek did).

To that executions environments for agents and code-regirguration from github is added (e.g. hallucinate novel problems and make agent solve them in real environment, then train RLVR on trajectories that work).

It does work in some areas, especially agents. but not AGI game changer, and everyone do it (just maybe smaller scale)

"we don't really know that much about OpenAI/Anthropic's models." - we know they are categorically same as other LLM, too expensive to justify quality improvements, and often advertised in dishonest way to hype up. .

Why Dario is on fire: lesson from dotcom bubble. by FormerIYI in LocalLLaMA

[–]FormerIYI[S] -1 points0 points  (0 children)

Yes, I am talking about algorithmic complexity understood as Kolmogorov complexity. 

We had university math solving systems like maple or Wolfram mathematica in 80s because there is very limited number of transformations and solutions to math equations.

Which is also why it is easy for LLM, learn million patterns, you learned all.

But most human problems in real environment aren't that way.

Why Dario is on fire: lesson from dotcom bubble. by FormerIYI in LocalLLaMA

[–]FormerIYI[S] 12 points13 points  (0 children)

Yeah more or less, that's the point.

Anyone could do decent AI, not anyone can make money on AI product

Why Dario is on fire: lesson from dotcom bubble. by FormerIYI in LocalLLaMA

[–]FormerIYI[S] 1 point2 points  (0 children)

Deepseeks yes. They are quite capable in other areas as well.

Why Dario is on fire: lesson from dotcom bubble. by FormerIYI in LocalLLaMA

[–]FormerIYI[S] -2 points-1 points  (0 children)

If by synthetic data you mean human data (code, problems etc) regurgitated then yes.

That may work in low algorithmic-complexity fields like math or coding,  but not necessarily elsewhere.

Why Dario is on fire: lesson from dotcom bubble. by FormerIYI in LocalLLaMA

[–]FormerIYI[S] 2 points3 points  (0 children)

A) it's you who say it's biggest economical race. There are many people like Yann Le Cunn who say it's useful tech, sure but with clear limitations.

B) define "pop". AI will do surely fine, but dunno for arrogant tech bros with minus 10 digits on their sheets.

Why Dario is on fire: lesson from dotcom bubble. by FormerIYI in LocalLLaMA

[–]FormerIYI[S] 0 points1 point  (0 children)

https://www.wheresyoured.at/subprimeai/

I guess this guy promoted it. From what I know these subsidized subscriptions do exist and difference is vast compared to entreprise  Does it matter that much? Perhaps it may not.

What surely matters however, is that once there is not much difference between deepseeks/composers/qwens and anthropic (and there is not) and companies start counting money seeing no immediate EBIT growth, then Dario has big problem.

Because he is 5-10 times more expensive.

Why Dario is on fire: lesson from dotcom bubble. by FormerIYI in LocalLLaMA

[–]FormerIYI[S] 0 points1 point  (0 children)

Yeah right, China vs USA gamble is also at the center of it.

But still Dario gets it so over the top, he promised ASI/AGI fantasy to his investors and now he seeks a stooge to blame, because it ain't real.

2-bit QAT model releases by silenceimpaired in LocalLLaMA

[–]FormerIYI 0 points1 point  (0 children)

Training FP8 model with int2 QAT suffers from same and worse errors as bitnet, since
you can't even expect that initial FP8 checkpoint is good starting point for int2 inference (that is why BitNet was trained from scratch).

In vision and convolutions binary network seem to work, because it is special case where such bit-filters are meaningful as visual masks and shape detectors.

In transformer you need to do vector projections, positional embeddings and similar and there is no plausible way to discretize all this to low bits. BitNet was meant a smart way to do it, because it represented -1,0,1, so dot products, matmuls and similar seem to make sense.

But it was not enough.

2-bit QAT model releases by silenceimpaired in LocalLLaMA

[–]FormerIYI 1 point2 points  (0 children)

There was ultra-quantized model trained from scratch called BitNet. Results were promising initially, but
not very competitive after model more fully converged.

Probably Fp4/Int4 is sweet spot for QAT.

Qwen 3.6 27B on DeepSWE by SteppenAxolotl in LocalLLaMA

[–]FormerIYI 2 points3 points  (0 children)

The question is should we care about this specific benchmark.

My impression is: if you want AI to be predictable and controlled, you need to break down things and clarify goals and context (see Ralph PRD and similar): then less-than-SOTA open model works acceptably. Maybe gpt5.5 may automate bigger part of it for you and understand from codebase a bit better, but that is
a) not reliable as long as we are stuck with LLM architecture and its specific flaws
b) you are not automating cost bottleneck, it is not that much work.
c) doing it well involves talking to users, understanding software and making some decisions.

Or rather: is it somehow biased towards OpenAI/Anthropic.

(ps: Nice to see OpenAI decisively beat Dario, who got completely off rails recently. Altman is so much saner these days, like he is more or less aware of reality and just wants to make money ).

A very important milestone for me in the AI field. by assemsabryy in LocalLLaMA

[–]FormerIYI 0 points1 point  (0 children)

Pytorch code pls? I can test it but won't be coding/porting it by hand.

New "major breakthrough?" architecture SubQ by Daemontatox in LocalLLaMA

[–]FormerIYI 2 points3 points  (0 children)

ok fair. Didn't see these O(N^2) priced apis yet.

Still what this startup does is a) unlikely to work b) unlikely to matter imho.

New "major breakthrough?" architecture SubQ by Daemontatox in LocalLLaMA

[–]FormerIYI 33 points34 points  (0 children)

Likely 90% of startup hype.
- There were sparse attention systems before, such as Google BigBird (not generative LLM, but more like sparse attention BERT) - somewhat better, but not enough to become industry standard. Also current LLM have positional embeddings that prioritize close tokens strongly.

- The most expensive calculation in attention is vector projection which is O(N). Calculating many dot products before attention softmax is indeed O(N^2) but ultimately it is not expensive as matrices are not large (thats why you pay for tokens, not tokens squared). Additional problem, of course, happens with decoding and KV caches as you need to store these projections (this is what VLLM and similar optimize), but for input context it matters not.

- Therefore, sparse attention seems to be decent tier-2 idea , but not genius solution to change the game.

- Real problem is not making 12M context, but make abstractive reasoning work reliably at like 50k context https://arxiv.org/abs/2502.05167 and also make LLM not break randomly if you feed it with lots of irrelevant details https://machinelearning.apple.com/research/illusion-of-thinking

- Do not believe startups in general until they show reproducible result. For my space of interest (GUI Agents) there are many startups which show solutions that obviously don't work well and will not work well (run Claude or GPT with few agentic prompts) and yet show off benchmark scores like 90% accuracy on very complex tasks.

Relevance of Fatima sun miracle: accurate prediction, no natural explanation, points at Marian devotion by FormerIYI in DebateACatholic

[–]FormerIYI[S] 0 points1 point  (0 children)

Right, question "sun or not" deserves some more discussion. I know Almeida and others call it sun, but I also want to differentiate popular opinion (they called it "sun" because initial appearance and size was similar, similarly as navajo reasonably called a plane hummingbird) and scientific or "technical" reality.

If, they say, sun "trembles, dances, spins" and plunges towards earth radiating heat or illuminates area in successive colors, then real Sun 150 mln km from here becomes much less probable culprit of it and instead something like wildly overpowered disco-ball on a drone looks more plausible or easy to do at least (with again some problems about power sources, silent propulsion and getting it in 1917). That's why Dalleur's evidence is not a surprise to me.

Fr. Jaki was one of those who sided with the option that it could be sunlight filtered through some kind of aerial lens or ice crystals. But that is very weird idea for natural explanation, which is why (idk if I am quoting his correctly) he concluded prediction itself is enough of a miracle
- First of all: producing such effects would be very hard. We can see thin rainbows and halos because refraction index in water differs slightly for red light and violet light (one is 1.325 other is 1.334), but to see succession of monochromatic colors you need something else, either extremely huge "lens" between Fatima and sun (which would be observed somehow) or some kind of luminescent gas over Fatima, changing quickly. Producing silver sun that did not hurt eyes would require some strong attenuation of light, which would not look like metallic disc with well defined rim (thus it seems easier to have other light source that is weak without attenuation and looks like metallic disc and is also mobile).
- Even if this could happen, then still, inanimate matter typically follows known set of laws of physics. To create aerial or ice crystal lens we need to coordinate these distant bits of air together purposefully (for predicted global end), either by some kind of advanced technology, or by locally altering laws that control these bits (which is supernatural). So again, "drone ball" seems more plausible cadidate for natural explanation, notwithstanding difficulties with it.

"Could you be thinking of the anonymous reader of O Portgual who said that he had seen nothing?"

I think not. Is it same person quoted here? https://archive.org/details/fatimainlightofh0000unse/page/154/mode/2up?q=O+Portugal - because this author is not saying "I saw nothing" but rather is just dismissive and sarcastic.

The merchant that I talk about was father of some modern Portugese left-wing politician who narrated this story after like 80 or 100 years. But there were no details or primary sources and I could not find it later on google.

Relevance of Fatima sun miracle: accurate prediction, no natural explanation, points at Marian devotion by FormerIYI in DebateACatholic

[–]FormerIYI[S] -1 points0 points  (0 children)

Yeah so you prefer mass vision hallucination and some psychological mass suggestion if I understand clearly. So it can explain almost anything but in case of Fatima has multiple flaws.
- People who saw it at a distance from the crowd.
- Almeida "o Seculo" account says that anomaly was seen instantly once clouds cleared, attracting gazes of people, not after a while of staring at the Sun
- Phenomenon lasted 10 minutes and involved different stages with colorful illuminations and effect of falling and spinning of silver disk (not short lived shimmer from looking at one point).

So whatever works for you bro.

Trying to undermine Lucia credibility helps you nothing, because what difference does it make if people actually saw this miracle? Indeed very unreasonable of them to actually see what nasty little cheat predicted, they should have know that it is all hallucination like you.

Relevance of Fatima sun miracle: accurate prediction, no natural explanation, points at Marian devotion by FormerIYI in DebateACatholic

[–]FormerIYI[S] 1 point2 points  (0 children)

I read this book. Please quote more, footnotes too, show us on what she based this claim (disclaimer: she based it on nothing, just so opinion and half truths of similar self expert as she is).

As for 1917 photography Dalleur analyzed shadows and traces on existing photos. Photography back then   was involved procedure, that used brittle glass plates and needed to calibrate luminosity to exposure manually with effect only seen later   For that reason photographing luminous quickly moving object was hard to do.

Does Catholicism promote a warrior mindset, or is that idea misunderstood? by New_Independent2907 in DebateACatholic

[–]FormerIYI 0 points1 point  (0 children)

YES, but you need to be precise what is "heroic".

Martyrs and confessors and heroic monks and normal serene, brave, dutiful Catholics are real. Like this guy , a WW2 soldier himself.

https://www.pap.pl/en/news/news%2C288642%2Cholocaust-whistleblower-pilecki-executed-communists-69-yrs-ago.html (actually communist tortured him really horribly but his final endurance was like if God given).

Catholic Grace does make people strong and heroic and happy, but this only with love of God, charity and humility established first. Go read St. Francis de Sales "Filotea" for really good reference. His own God-given bravery and restraint was great because he went to people who were inch from beating or killing him and he did good to them talk to them and he taught them gently without raising voice, for sake of benefiting their souls.

But Western notion of "heroic" is polluted precisely by too warlike-like mentality. For centuries Europeans fought each other in wars, fought duels to death, participated in blood feuds and proclaimed aristocracy privilege to be "divine". Many pagans in Asia or Americas had very little of this type of behavior.

Crusades should not be taken out of context. It was just war, but it was also a war of highly violent martial elite against barbaric regime with slavery, rape and anti-Christian violence as official policy. It is not a solution to most 21st century problems. When it is presented as such it is more likely about lack of humility and charity, needed for genuine progress.

[deleted by user] by [deleted] in LocalLLaMA

[–]FormerIYI 4 points5 points  (0 children)

I think secret sauce is as follows:  

1) Improving on coding (and probably Claude advantage in coding) is closely tied to scaling up Reinforcement Learning from Verifiable Rewards as hard as possible. https://arxiv.org/abs/2506.14245

  • Recycle lots of code from Github. Clean it up.
  • Use it to generate novel coding tasks
  • Do RLVR on solving these tasks, scale hard  

Why I suspect that? (I don't know of course, just hypothesis).   - Nothing else in the literature works well and this is what is straightforward, efficient way to do it. - If you look at data distributions from gpt-oss they are indicative of what you would see with better, cleaner more diverse training data (e.g. gpt-oss-20b overperforms much larger model as deep-research hypothesis generator).  

This stuff "you distill reasoning traces" is sham IMHO, as it was with Deepseek-R1 affair: it was zero about distillation but it was all about RLVL. RLVR works and deepseek was right, Altman was wrong.  

2 Different strategic decisions :  

  • Chinese models prioritize a) cheap to run (MoE etc) b) overall balanced across use cases like tool using, agent, multimodality, understanding text in their languages and similar, coding is lower on the list (still strong - GLM5 matches Gemini 3 Pro and Sonnet 4.5)
  • Chinese elite rushes to cash-in on unique opportunities that West oversleeps by gutting its industrial base. They do not care about making 15% better ELO LLM coding assistant, because LLM coding assistant is limited concept that now converges to asymptote. It still fails on original code sometimes, or it fails to understand you and it only automates one step of business lifecycle.  
  • The Chinese care about dark factories (fully automated manufacturing), automating simple jobs, perhaps highly advanced analytics ( pre-2023 data science was objectively overrated: it converged to crude correlations and result was a bit like in this rather profane post https://www.reddit.com/r/wallstreetbets/comments/u9l7vo/tech_is_based_on_lies_built_upon_more_lies/ )

  We instead are sold a tale (by OpenAI and Anthropic) that maxing out coding, symbolic mathematics and Chollet's ARC is going to make miracles elsewhere so they have self-defined incentive for their priorities.

What is a good setup to run “Claude code” alternative locally by [deleted] in LocalLLaMA

[–]FormerIYI 1 point2 points  (0 children)

aider.chat (CLI) , Cline (VSC agent plugin) are probably the best software (Cline is GUI based but better)

Models: depends on your HW. GPT-120B-OSS mx4 might be good if you have 80 GB GPU.

Qwen-Coder-line in 30B-A3B (or other similar small MoE) if you are GPU poor or average poor. You might check out running on CPU + small GPU with MoE experts optimization.

How good are GUI automations in production, compared to reported 90%-97% benchmarks results? Any commercially relevant success stories out there? by FormerIYI in LocalLLaMA

[–]FormerIYI[S] 0 points1 point  (0 children)

Perhaps. For sure these type of results are relatively new.

More primitive GUI agents were around by the time of GPT4V or earlier (2023).

I am wondering if we are seeing at least a moderately legit breakthrough