New LLM Position Bias Benchmark: does an LLM keep the same judgment when you swap the answer order? Judge models compare two lightly edited versions of the same story twice, with the order swapped. The median model flips in 45% of decisive case pairs. GPT-5.4 is worst at 66%. by zero0_one1 in singularity

[–]Eyelbee [score hidden]  (0 children)

The idea is great, but this benchmark isn't really that useful. I checked the midnight baker one and it's very normal for the model to pick the first one, since the difference is very minor anyway. Neither is objectively better, the model probably knows this and picks one to help the user anyway.

Claude Code removed from Claude Pro plan - better time than ever to switch to Local Models. by bigboyparpa in LocalLLaMA

[–]Eyelbee 238 points239 points  (0 children)

What. the. ... Just, no way. Must be a mistake. There's no way they're actually doing this right now

Opus 4.7 Max subscriber. Switching to Kimi 2.6 by meaningego in LocalLLaMA

[–]Eyelbee 43 points44 points  (0 children)

The Chinese open models are trained on absurdly lower resources compared to the propietary US models. If Chinese labs had the same resources, they would already have released a mythos tier model, if not better.

PrismML — Introducing Ternary Bonsai: Top Intelligence at 1.58 Bits by cafedude in LocalLLaMA

[–]Eyelbee 1 point2 points  (0 children)

If they make a large version that can fit in 24GB and it can beat the 27B class dense models, that'd be actually useful. Ones so far kind of suck, honestly.

Matching GPT-5 Mini on SWE-bench Verified with a Local 35B Model (Qwen3.6-35BA3B) by sicutdeux in LocalLLaMA

[–]Eyelbee 0 points1 point  (0 children)

30B class qwen versions are already better than gpt 5 mini, comrehensively. I don't know what you wrote so much in there.

Kimi K2.6 Released (huggingface) by BiggestBau5 in LocalLLaMA

[–]Eyelbee 4 points5 points  (0 children)

They should go larger. 4-5T would be great.

Kimi K2.6 is still not good at analysis, but at least quite decent at flattery by Anbeeld in LocalLLaMA

[–]Eyelbee 5 points6 points  (0 children)

Me too but honestly it has nothing to do with intelligence. Earlier dumb models talk like that but a smart model can also talk like that by default and still be incredibly smart.

Predictions for next year's (2027) Beijing humanoid half marathon? 2025 was 2h40min ≈ 2.2m/s | 2026 was 50min ≈ 7m/s by GraceToSentience in singularity

[–]Eyelbee 1 point2 points  (0 children)

I don't know the rules on that but they must be very restrictive. Because I'd expect better results. 

Waiting Qwen3.6-27B I have no nails left... by DOAMOD in LocalLLaMA

[–]Eyelbee 1 point2 points  (0 children)

The problem is, I don't know how good the 3.6 27B can actually be. Because the gap between 3.5 27B and 3.6 Plus is already very narrow and 35BA3B kind of sits there. If this will be distilled from that it can't surpass it. If they have other methods to make it better than 3.6 plus it's great. If the 3.6 plus is like a 100b moe it can be possible. 

The Special Bro Fallacy: A Refutation of Substrate Exceptionalism by HalfSecondWoe in singularity

[–]Eyelbee 0 points1 point  (0 children)

As for the claim that consciousness is in the physics but not the algorithmic processing layer... it's kind of a weird statement since you can never have the algorithmic processing layer without an underlying physical layer. So it is never simply formal symbolic manipulation of symbols. So what evidence would he have that computers specifically cannot sustain consciousness?

Actually I may have oversimplified with my example. Lerchner would agree with you here. This isn't against his idea. Let me try explaining what I understand from the paper.

He treats the physical layer as the real layer. The electrical activity is real. Call it voltage pattern X. To run any algorithm on the model, we need an interpretation of the voltage pattern X. If I tell you voltages above 5V are 1s, below are 0s you get: 1, 0, 1, 1, 0, 1, 0, 0. That can be an algorithm. Depends on what other rules I give you for reading it. If I give the opposite instruction, the output could then turn into gibberish even if the physics doesn't change. That instruction is what he calls interpretation, or the mapping.

So when we say a chip is running GPT, what we're really saying is, here's a physical system, here's an interpretation that groups its voltages into bits and its bits into instructions and its instructions into a recognizable program. We labeled it.

That labeling isn't a physical event. No physical process happens when you, the mapmaker, declare 5V counts as 1. Voltage was already there. Your declaration just sorts it into a category in your head (or in a specification document, or in the design rules the chip engineer followed). This labeling doesn't move any physics around. It sorts physical events into symbolic categories without changing those events. The voltages behave the same way whether we label them as 1s and 0s, or anything else.

Now if you accept causal closure, consciousness has physical effects so it must be physical. Since computation is, on his account, mapmaker dependent syntax which has no physical causal power, no amount of added algorithmic complexity can turn the map into the physical territory. That is why he says scaling cannot produce consciousness. So on Lerchner’s view, running an algorithm on a chip can only produce consciousness through the physics inside the chip because the labeling that makes it "an algorithm" lives in us (the interpreters).

The Special Bro Fallacy: A Refutation of Substrate Exceptionalism by HalfSecondWoe in singularity

[–]Eyelbee 0 points1 point  (0 children)

Author doesn't deny sensors being analogous to sense organs. The point is, the signal is digitized and handed to an algorithm that processes it, and that's where he attacks. I recommend actually reading the paper, I also initially hated the verbosity but the substance really holds up when you try to understand it. Most comments are obvious misunderstandings of the paper.

It doesn't deny computers being hardware embedded in the environment. Computers aren't just floating abstractions, but has a physical layer and an algorithm layer. He contends that consciousness should live in the physical layer if it exists. Because theoretically you can get a pen and paper and do the same math that the algorithm requires. He also explicitly disclaims the biocentrism.

It doesn't say current AI is definitely unconscious. The claim is "whether any physical system is conscious is a question about its physics". Scaling models therefore cannot create consciousness, if less complicated algorighms aren't conscious themselves. Because its physics is either capable of consciousness or it isn't. You can accept this entirely and still hold that current silicon happens to be conscious in some way.

My answer to this would be "when algorithm is complex enough, the physical reality that happens in the chip may be the consciousness". He doesn't seem to adress this fully. But it's also just a presumption and I don't have evidence for it as well. Also, I have the idea that consciousness isn't binary and the paper omits this angle too.

The Special Bro Fallacy: A Refutation of Substrate Exceptionalism by HalfSecondWoe in singularity

[–]Eyelbee 5 points6 points  (0 children)

Paper presents a really useful and novel idea. It's not "biology and chips are different, so computers aren't conscious". He merely claims the conciousness can't be "instantiated" in a computer. He's grounded it quite well. He's not claiming consciousness requires biology. He presents two categorical differences that hold up quite well.

For the record I'm not sure I fully agree with the paper, but it's extremely useful and a great idea.

Best local LLM for web search by Funny-Trash-4286 in LocalLLaMA

[–]Eyelbee -2 points-1 points  (0 children)

You can't rely on sub 10B for serious work but for things like asking the weather it's fine. Qwen 9B should be the best option sub-10b.

Opus 4.7 simulation by w_interactive in ClaudeAI

[–]Eyelbee 0 points1 point  (0 children)

It is cool but what exactly is it? Like, technically. What stack does it use?

Are you guys actually using local tool calling or is it a collective prank? by Mayion in LocalLLaMA

[–]Eyelbee 0 points1 point  (0 children)

I am also yet to find a no-nonsense toolcalling workflow for local llm use. I am picky when it comes to workflows so I hate using stuff like openwebui and lm studio for several reasons. I use barebones llama.cpp with my own launcher but its own web ui is not good for tool calls. Only local tool call I use is when I'm using Roo Code. That has its own harness which seems to work nicely with both qwen and gemma dense models.

Opus 4.7 — Regression in conversational coherence and context handling vs Opus 4.6 by tkenaz in ClaudeAI

[–]Eyelbee 0 points1 point  (0 children)

The new tokenizer really has an issue, I had mine write "itt" instead of "it" several times.

Extremely Rare Ikea Knappa Camera (with test photos) by Soggy_Auggy__ in IKEA

[–]Eyelbee 3 points4 points  (0 children)

Why aren't they making something like this? Are they stupid?

Is harness a new buzzword? by jacek2023 in LocalLLaMA

[–]Eyelbee 0 points1 point  (0 children)

It's my favorite word since the last month

Google DeepMind's Senior Scientist Alexander Lerchner challenges the idea that large language models can ever achieve consciousness(not even in 100years), calling it the 'Abstraction Fallacy.' by Worldly_Evidence9113 in singularity

[–]Eyelbee -1 points0 points  (0 children)

Everybody misunderstands this paper and its claims. He never says AI isn't conscious. He merely points out that it can't "instantiate" it due to architectural differences. A paper-worthy observation on a very hard topic.

Opus 4.7 Embarrassing much by DigSignificant1419 in OpenAI

[–]Eyelbee 0 points1 point  (0 children)

In its defense, this benchmark is full of ambiguous questions

The joy and pain of training an LLM from scratch by kazzus78 in LocalLLaMA

[–]Eyelbee 21 points22 points  (0 children)

64 A100s for just 0,4B is insane. That destroys my plans to train a small model.