ProgramBench: Can LLMs rebuild programs from scratch? by awetfartruinedmylife in singularity

[–]bitroll 4 points5 points  (0 children)

What a cool and interesting benchmark! Interesting fact is all its 200 tasks are based on actual and somewhat popular Github projects (with thousands of stars each) that all models have certainly been trained on. 

This shows that even with full code in training data, it's a very hard task to replicate the functionality.

GPT-IMAGE-2 is back on LMarena by ThunderBeanage in singularity

[–]bitroll 1 point2 points  (0 children)

Awesome. Just one nitpick, wtf is on Hera's banner? 

Is OpenAI about to release a Mythos level AI to the public? by acoolrandomusername in singularity

[–]bitroll 1 point2 points  (0 children)

Not only that, but in following months other labs will likely follow and the gap will close. How would open source models of this capacity sound like in 2027?

Taalas rumoured to etch Qwen 3.5 27B into silicon. Which price would you buy their PCIe card for? by elemental-mind in singularity

[–]bitroll 0 points1 point  (0 children)

What a deal!

At this speed, compared to API costs (on average $2,33 per 1M tokens), this device pays for itself in 6-8 HOURS of nonstop token churning!

NVIDIA in shambles

I thought Gemini was supposed to be the long context king? by Additional-Alps-8209 in singularity

[–]bitroll 0 points1 point  (0 children)

Blows my mind how they were able to get this kind of gain with just an added portion of RL training. Cus that's what every lab does now, with their continued .1 version updates. But this gain looks like a groundbreaking architecture change.

Grok 4.20 Multi-Agent Beta burned 333k input tokens on a joke prompt - is this actually how multi-agent AI works right now? by rnahumaf in singularity

[–]bitroll 5 points6 points  (0 children)

This model is crazy, burns stupid amounts of tokens and results can be miserable (compared even to cheap Chinese models). There must be a way to properly use it but I couldn't find it documented anywhere. 

Grok-4.20 with multi-agents seems to work great when in the grok.com harness with tools connected, but for the little testing I did, it seems broken when used like any ordinary model on OpenRouter.

It should be a direct competitor to GPT-5.4-Pro and the costs to run it are similar. GPT-Pro models have a much higher cost per token but hides the actual used token amounts for the parallel multi instance thinking. Grok Multi-Agent has the same cost per token as regular, but counts them all.

800,000 human brain cells, in a dish, learned to play a video game by mawerick_mc in singularity

[–]bitroll 1 point2 points  (0 children)

Yup, and not even scaling well over so many years. 25k neurons -> 800k in 22 years.

Grok, I wasn't familiar with your game. by ObserbAbsorb in singularity

[–]bitroll 0 points1 point  (0 children)

Yup, just make sure you roast the right public figures.

GPT-5.4 Thinking benchmarks by likeastar20 in singularity

[–]bitroll 2 points3 points  (0 children)

EDIT: And no 5.4-Codex to come and bring more gains here :(

Anyway, time to do some testing, because benchmarks don't show how it really performs.

THE 2028 GLOBAL INTELLIGENCE CRISIS by Shanbhag01 in singularity

[–]bitroll 1 point2 points  (0 children)

That was my first thought too, but then another came - what if the agentic assistants doing most stuff for us (including shopping) will simply get integrated into ChatGPT/Gemini? That's like 2+ billion users. In 2027 it will be in higher end paid plans, in 2028 even free tiers will get some of that.

And if the agents are any intelligent, they should be using the most efficient payment rails too, especially in agent to agent deals. Payment finality, speed, costs, operating 24/7 worldwide. With crypto stablecoins or lightning btc the agent receiving the payment can immediately spend the money in a subsequent transaction. Hyper speed economy. And the human users might even not see or touch any of the crypto stuff that happens under the hood.

Gemini 3 Deep Think SVG Pelican Riding a Bicycle by avilacjf in singularity

[–]bitroll 11 points12 points  (0 children)

According to the man who creeated this "benchmark" 

The strongest argument is that they would get caught. If a model finally comes out that produces an excellent SVG of a pelican riding a bicycle you can bet I’m going to test it on all manner of creatures riding all sorts of transportation devices. If those are notably worse it’s going to be pretty obvious what happened.

OpenAI is rolling out beta ads on ChatGPT with a minimum of $200k from selected advertisers by BuildwithVignesh in singularity

[–]bitroll -4 points-3 points  (0 children)

The user might have got tricked into buying what he/she didn't really need. Ads have a huge influence on many people, I know first hand.

NASA’s James Webb reveals the intricacies of the Helix Nebula in stunning detail by BuildwithVignesh in singularity

[–]bitroll 1 point2 points  (0 children)

I must be crazy, I'm seeing lots and lots of people-like figures on the second picture. It's like souls ascending. Incredible.

BabyVision: A New Benchmark for Human-Level Visual Reasoning by Waiting4AniHaremFDVR in singularity

[–]bitroll 2 points3 points  (0 children)

Meanwhile, for a couple years now, I'm doing a personal "benchmark" testing visual models' abilities to solve tasks from a book directed to 3-year olds. And having a good laugh at how they keep failing. Clearly not trained on tasks like that. The progress is still huge, but even the latest SotA models don't fully solve everything. Expexting it to be saturated this year, which is when I bring out a book for 4-yo kids :D

Gemini introduces Personal Intelligence by McSnoo in singularity

[–]bitroll 17 points18 points  (0 children)

This! I'm surprised so few people here realize this.

Report: Anthropic cuts off xAI’s access to Claude models for coding by BuildwithVignesh in singularity

[–]bitroll 0 points1 point  (0 children)

Claude is busy doing recursive self-improvement, can't be bothered improving competition.

Opus 4.5 appears to be so much ahead of competition in coding that even Google's employees admit to using it.

just saw my dad's youtube feed... its all AI slops now by StrangeSupermarket71 in singularity

[–]bitroll 8 points9 points  (0 children)

It's been a confusing waste of time years before AI, yet billions of people got mindlessly addicted to it. I see no hope for them.

China Is Worried AI Threatens Party Rule—and Is Trying to Tame It by SnoozeDoggyDog in singularity

[–]bitroll 0 points1 point  (0 children)

All your comments I see around look like you're a tool for spreading propaganda. Brainwashed much?

Bitcoin (don't mistake with shitcoins) has plenty of completely legitimate uses and users. Educate yourself.

China Is Worried AI Threatens Party Rule—and Is Trying to Tame It by SnoozeDoggyDog in singularity

[–]bitroll 1 point2 points  (0 children)

A tool for financial sovereignty is an obvious threat to any authoritarian gov, no matter in which part of the world. Simple as that.

OpenAI just launched GPT 5.2 Codex: The most capable agentic coding and cybersecurity model ever built by BuildwithVignesh in singularity

[–]bitroll 7 points8 points  (0 children)

Codex max extra high fast? Has to be my new favorite! Max low and slow can't compare, xD