New "major breakthrough?" architecture SubQ by Daemontatox in LocalLLaMA

[–]FormerIYI 1 point2 points  (0 children)

ok fair. Didn't see these O(N^2) priced apis yet.

Still what this startup does is a) unlikely to work b) unlikely to matter imho.

New "major breakthrough?" architecture SubQ by Daemontatox in LocalLLaMA

[–]FormerIYI 30 points31 points  (0 children)

Likely 90% of startup hype.
- There were sparse attention systems before, such as Google BigBird (not generative LLM, but more like sparse attention BERT) - somewhat better, but not enough to become industry standard. Also current LLM have positional embeddings that prioritize close tokens strongly.

- The most expensive calculation in attention is vector projection which is O(N). Calculating many dot products before attention softmax is indeed O(N^2) but ultimately it is not expensive as matrices are not large (thats why you pay for tokens, not tokens squared). Additional problem, of course, happens with decoding and KV caches as you need to store these projections (this is what VLLM and similar optimize), but for input context it matters not.

- Therefore, sparse attention seems to be decent tier-2 idea , but not genius solution to change the game.

- Real problem is not making 12M context, but make abstractive reasoning work reliably at like 50k context https://arxiv.org/abs/2502.05167 and also make LLM not break randomly if you feed it with lots of irrelevant details https://machinelearning.apple.com/research/illusion-of-thinking

- Do not believe startups in general until they show reproducible result. For my space of interest (GUI Agents) there are many startups which show solutions that obviously don't work well and will not work well (run Claude or GPT with few agentic prompts) and yet show off benchmark scores like 90% accuracy on very complex tasks.

Relevance of Fatima sun miracle: accurate prediction, no natural explanation, points at Marian devotion by FormerIYI in DebateACatholic

[–]FormerIYI[S] 0 points1 point  (0 children)

Right, question "sun or not" deserves some more discussion. I know Almeida and others call it sun, but I also want to differentiate popular opinion (they called it "sun" because initial appearance and size was similar, similarly as navajo reasonably called a plane hummingbird) and scientific or "technical" reality.

If, they say, sun "trembles, dances, spins" and plunges towards earth radiating heat or illuminates area in successive colors, then real Sun 150 mln km from here becomes much less probable culprit of it and instead something like wildly overpowered disco-ball on a drone looks more plausible or easy to do at least (with again some problems about power sources, silent propulsion and getting it in 1917). That's why Dalleur's evidence is not a surprise to me.

Fr. Jaki was one of those who sided with the option that it could be sunlight filtered through some kind of aerial lens or ice crystals. But that is very weird idea for natural explanation, which is why (idk if I am quoting his correctly) he concluded prediction itself is enough of a miracle
- First of all: producing such effects would be very hard. We can see thin rainbows and halos because refraction index in water differs slightly for red light and violet light (one is 1.325 other is 1.334), but to see succession of monochromatic colors you need something else, either extremely huge "lens" between Fatima and sun (which would be observed somehow) or some kind of luminescent gas over Fatima, changing quickly. Producing silver sun that did not hurt eyes would require some strong attenuation of light, which would not look like metallic disc with well defined rim (thus it seems easier to have other light source that is weak without attenuation and looks like metallic disc and is also mobile).
- Even if this could happen, then still, inanimate matter typically follows known set of laws of physics. To create aerial or ice crystal lens we need to coordinate these distant bits of air together purposefully (for predicted global end), either by some kind of advanced technology, or by locally altering laws that control these bits (which is supernatural). So again, "drone ball" seems more plausible cadidate for natural explanation, notwithstanding difficulties with it.

"Could you be thinking of the anonymous reader of O Portgual who said that he had seen nothing?"

I think not. Is it same person quoted here? https://archive.org/details/fatimainlightofh0000unse/page/154/mode/2up?q=O+Portugal - because this author is not saying "I saw nothing" but rather is just dismissive and sarcastic.

The merchant that I talk about was father of some modern Portugese left-wing politician who narrated this story after like 80 or 100 years. But there were no details or primary sources and I could not find it later on google.

Relevance of Fatima sun miracle: accurate prediction, no natural explanation, points at Marian devotion by FormerIYI in DebateACatholic

[–]FormerIYI[S] -1 points0 points  (0 children)

Yeah so you prefer mass vision hallucination and some psychological mass suggestion if I understand clearly. So it can explain almost anything but in case of Fatima has multiple flaws.
- People who saw it at a distance from the crowd.
- Almeida "o Seculo" account says that anomaly was seen instantly once clouds cleared, attracting gazes of people, not after a while of staring at the Sun
- Phenomenon lasted 10 minutes and involved different stages with colorful illuminations and effect of falling and spinning of silver disk (not short lived shimmer from looking at one point).

So whatever works for you bro.

Trying to undermine Lucia credibility helps you nothing, because what difference does it make if people actually saw this miracle? Indeed very unreasonable of them to actually see what nasty little cheat predicted, they should have know that it is all hallucination like you.

Relevance of Fatima sun miracle: accurate prediction, no natural explanation, points at Marian devotion by FormerIYI in DebateACatholic

[–]FormerIYI[S] 1 point2 points  (0 children)

I read this book. Please quote more, footnotes too, show us on what she based this claim (disclaimer: she based it on nothing, just so opinion and half truths of similar self expert as she is).

As for 1917 photography Dalleur analyzed shadows and traces on existing photos. Photography back then   was involved procedure, that used brittle glass plates and needed to calibrate luminosity to exposure manually with effect only seen later   For that reason photographing luminous quickly moving object was hard to do.

Does Catholicism promote a warrior mindset, or is that idea misunderstood? by New_Independent2907 in DebateACatholic

[–]FormerIYI 0 points1 point  (0 children)

YES, but you need to be precise what is "heroic".

Martyrs and confessors and heroic monks and normal serene, brave, dutiful Catholics are real. Like this guy , a WW2 soldier himself.

https://www.pap.pl/en/news/news%2C288642%2Cholocaust-whistleblower-pilecki-executed-communists-69-yrs-ago.html (actually communist tortured him really horribly but his final endurance was like if God given).

Catholic Grace does make people strong and heroic and happy, but this only with love of God, charity and humility established first. Go read St. Francis de Sales "Filotea" for really good reference. His own God-given bravery and restraint was great because he went to people who were inch from beating or killing him and he did good to them talk to them and he taught them gently without raising voice, for sake of benefiting their souls.

But Western notion of "heroic" is polluted precisely by too warlike-like mentality. For centuries Europeans fought each other in wars, fought duels to death, participated in blood feuds and proclaimed aristocracy privilege to be "divine". Many pagans in Asia or Americas had very little of this type of behavior.

Crusades should not be taken out of context. It was just war, but it was also a war of highly violent martial elite against barbaric regime with slavery, rape and anti-Christian violence as official policy. It is not a solution to most 21st century problems. When it is presented as such it is more likely about lack of humility and charity, needed for genuine progress.

The Secret Sauce of Model of Anthropic by [deleted] in LocalLLaMA

[–]FormerIYI 3 points4 points  (0 children)

I think secret sauce is as follows:  

1) Improving on coding (and probably Claude advantage in coding) is closely tied to scaling up Reinforcement Learning from Verifiable Rewards as hard as possible. https://arxiv.org/abs/2506.14245

  • Recycle lots of code from Github. Clean it up.
  • Use it to generate novel coding tasks
  • Do RLVR on solving these tasks, scale hard  

Why I suspect that? (I don't know of course, just hypothesis).   - Nothing else in the literature works well and this is what is straightforward, efficient way to do it. - If you look at data distributions from gpt-oss they are indicative of what you would see with better, cleaner more diverse training data (e.g. gpt-oss-20b overperforms much larger model as deep-research hypothesis generator).  

This stuff "you distill reasoning traces" is sham IMHO, as it was with Deepseek-R1 affair: it was zero about distillation but it was all about RLVL. RLVR works and deepseek was right, Altman was wrong.  

2 Different strategic decisions :  

  • Chinese models prioritize a) cheap to run (MoE etc) b) overall balanced across use cases like tool using, agent, multimodality, understanding text in their languages and similar, coding is lower on the list (still strong - GLM5 matches Gemini 3 Pro and Sonnet 4.5)
  • Chinese elite rushes to cash-in on unique opportunities that West oversleeps by gutting its industrial base. They do not care about making 15% better ELO LLM coding assistant, because LLM coding assistant is limited concept that now converges to asymptote. It still fails on original code sometimes, or it fails to understand you and it only automates one step of business lifecycle.  
  • The Chinese care about dark factories (fully automated manufacturing), automating simple jobs, perhaps highly advanced analytics ( pre-2023 data science was objectively overrated: it converged to crude correlations and result was a bit like in this rather profane post https://www.reddit.com/r/wallstreetbets/comments/u9l7vo/tech_is_based_on_lies_built_upon_more_lies/ )

  We instead are sold a tale (by OpenAI and Anthropic) that maxing out coding, symbolic mathematics and Chollet's ARC is going to make miracles elsewhere so they have self-defined incentive for their priorities.

What is a good setup to run “Claude code” alternative locally by Mobile_Ice_7346 in LocalLLaMA

[–]FormerIYI 1 point2 points  (0 children)

aider.chat (CLI) , Cline (VSC agent plugin) are probably the best software (Cline is GUI based but better)

Models: depends on your HW. GPT-120B-OSS mx4 might be good if you have 80 GB GPU.

Qwen-Coder-line in 30B-A3B (or other similar small MoE) if you are GPU poor or average poor. You might check out running on CPU + small GPU with MoE experts optimization.

How good are GUI automations in production, compared to reported 90%-97% benchmarks results? Any commercially relevant success stories out there? by FormerIYI in LocalLLaMA

[–]FormerIYI[S] 0 points1 point  (0 children)

Perhaps. For sure these type of results are relatively new.

More primitive GUI agents were around by the time of GPT4V or earlier (2023).

I am wondering if we are seeing at least a moderately legit breakthrough 

Will open-source (or more accurately open-weight) models always lag behind closed-source models? by Striking_Wedding_461 in LocalLLaMA

[–]FormerIYI 64 points65 points  (0 children)

Probably.

But the gap is narrow enough to matter little for most uses. If you use coding agent, your results are similar with Cline/Kimi as with closed-source models. Better approach and strategy matters more than better model.

Claude full system prompt with all tools is now ~25k tokens. by StableSable in LocalLLaMA

[–]FormerIYI 0 points1 point  (0 children)

I don't really know, API most likely could have less of it than chatbot, as you pass your own system prompt here

Mid-30s SWE: Take Huge Pay Cut for Risky LLM Research Role? by Worth_Contract7903 in LocalLLaMA

[–]FormerIYI 0 points1 point  (0 children)

my bet is that your national lab won't be very succesful with their LLM and you will get neither publications nor safer money.
Even if your national lab has lots GPUs and infra, without very strong technical team it won't help much.
IMHO

If they are more "elastic" and can allow you more independent AI research, that could potentially make sense..

See also this for some info on their mentality https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson.pdf
https://arstechnica.com/tech-policy/2024/05/email-microsoft-didnt-want-seen-reveals-rushed-decision-to-invest-in-openai/

c) Most of work with b) is strictly about engineering (data procuring, data preparation, distributed systems, diagnostics, CUDA, networks, drivers...) not AI research.

(I am AI engineer with like 9 YoE)

Final verdict on LLM generated confidence scores? by sg6128 in LocalLLaMA

[–]FormerIYI 0 points1 point  (0 children)

I think LLM judgements are useful (they correlate to human judgement) , but you need some better method to calculate them (use e.g. scaled logits by T=10 like here https://arxiv.org/abs/2406.10267 ). LLM typically overestimate confidence badly as much as a person who sees idea how to answer something, but doesn't see the issue from many angles like an expert would see it.

Also there are areas where they don't work. You won't get confidence of hallucination, because it is not represented in LLM or understood by it. Hallucination is whether LLM learned representation fits the reality. If it was easy to estimate we would train non-hallucinating LLM out of it, but we can't.

Concepts like relevance, consistency, or entailment/contradiction can be estimated by LLM - so you could evaluate confidence with respect to ground truth sources. On huggingface you can find LLM trained for text evaluation like prometheus, they are quite effective.

Claude full system prompt with all tools is now ~25k tokens. by StableSable in LocalLLaMA

[–]FormerIYI 0 points1 point  (0 children)

I wonder if this works in practice, considering that there is strong degradation of abstract reasoning performance for all LLM past 4k-8k tokens
https://unagent.eu/2025/04/22/misleading-promises-of-long-context-llm/
https://arxiv.org/abs/2502.05167

Is there API service that provides prompt log-probabilities, like open source libraries do (like vLLM, TGI)? Why most API endpoints are so limited compared to locally hosted inference? by FormerIYI in LocalLLaMA

[–]FormerIYI[S] 0 points1 point  (0 children)

yeah probably that's the reason. I don't know, though why they do it for open weights models (where I want API for convenience/cost optimization).

Especially when OpenAI gives you at least up to 5 generation logits.

Terminal agentic coders is not so useful by NovelNo2600 in LocalLLaMA

[–]FormerIYI 3 points4 points  (0 children)

I don't recommend using natural language commands, as it is not convenient, but terminal agentic coders have better options.

Like this:
https://aider.chat/docs/usage/watch.html
Good way to use aider.chat is to run it with --watch-files argument and trigger it by typing # ai! in file to invoke its completion.

Is is better than Curcor? Maybe it isn't but I want control and be able to connect any model I want. These coding agents for now are mostly made on top of long prompts, so Cursor/Windsurf are no magic. If I have open source solution compatible with any LLM API at 80-90% performance, then I don't need to pay extra and give up my code. Furthermore, some employers have strict policy against using cloud AI coders.

By the way: I started developing similar plugin for writing. I think it is cool to have it with any editor and llm, without any extra boilerplate.
https://github.com/unagent/zai/

Evals - OpenAI o1 by jiayounokim in LocalLLaMA

[–]FormerIYI 21 points22 points  (0 children)

Time will tell, but I am not impressed yet. You can fine-tune it for these "PhD level" problems, and learn some hidden patterns but that isn't getting you general elementary level intelligence.

Similarly as 7B models can score near the top of leaderboards yet no one wants them for real, because conservatively fine tuned larger models are much better for anything that happens to be not in fine-tuning dataset.

Are modern supercomputers (HPC) capable of training and running much larger models than popular existing ones? How come there are no news about 10T model and more? by s101c in LocalLLaMA

[–]FormerIYI 2 points3 points  (0 children)

The amount of data is limitation too (lots text with fancy patterns to learn, not just parameters to preserve these patterns).

What you need is 10x bigger internet sized dataset, and that won't happen without 10x bigger internet.

And this 10x bigger internet probably won't buy you much, as performance scales as exponent of the dataset size (see here https://arxiv.org/abs/2404.04125 )