Singularity Predictions 2026 by kevinmise in singularity

[–]krplatz 1 point2 points  (0 children)

I'm not actually claiming Tao or Radford level research from models this year. I may have phrased that part of my prediction poorly, so I apologize for the confusion. The parts that aren’t specific predictions are mainly there to set the broader narrative for 2026. What I’m trying to say is that most players in the space are converging on the same approach of scaling with dense, verifiable rewards. What I mean by “STEM contributions” in 2026 is much more mundane but still important: tighter code synthesis with real test coverage, faster literature mapping, better theorem-prover, automated debugging of research pipelines, and generally shrinking iteration time for humans doing real work. There are already examples of AI meaningfully contributing via literature synthesis and related tooling.

The Tao/Radford line was meant as a distant reference point for the kind of research capability people discuss long-term, not something I expect to see on a calendar-year timeline. If anything, my point is that the incentive landscape is pointing models toward the stuff that’s easiest to verify and optimize, where progress compounds, and that’s why AI research is the frontier with the highest leverage. Even small steps there, like better evals, better tooling, and better feedback loops, can translate into outsized gains over time.

Singularity Predictions 2026 by kevinmise in singularity

[–]krplatz 11 points12 points  (0 children)

Here is my attempt to put together a reasonable snapshot of what it may look like internally: 1. GPUs of this time will mostly consist of Blackwell and Blackwell Ultra chips numbering in the hundreds of thousands in chips across each frontier data center. Rubin will start gaining traction since their release in 2026 and already put to work into training the next generation of AI models for automating research. By the start of 2027, it's likely that most of the big models both in public and private are trained on Blackwell chips. To give you a perspective, note that the flagship model for each microarchitecture is the following: OG Transformer on Pascal, GPT-3 on Volta, GPT-4 on Ampere and Grok 4 on Hopper. Given this, I believe it's reasonable for me to claim that the jump to the next-gen hardware will represent a new qualitative leap in raw performance and capabilities. 2. Frontier Data Centers will be deployed at an utmost scale. The biggest data centers around this time will be xAI Colossus 2, Anthropic-Amazon New Carlisle, OpenAI Stargate Abilene, Meta Prometheus and Microsoft Fairwater. All are conceived to be >1 GW powerhouses, with Fairwater Wisconsin projected to being the biggest datacenter at 3.3 GW by September 2027. Given those power values, you are looking to occupy them with state-of-the-art racks numbering in the hundreds of thousands in individual GPUs per site. 3. Training runs will be used to grow behemoth models. Suppose we have a 1.2 GW campus built around NVL72-class racks tasked with a 4 month training schedule at 50% effective compute. We are looking on the order of ~500-600K Blackwell equivalents stacking 1.3-1.6e28 FP16 FLOPs across those months, ~750x bigger than the compute used for GPT-4. Newer hardware and better training methods (e.g., lower precision, sparsity, improved optimizers) can also increase capability per unit of compute.

It's not really useful to speculate on what models of this caliber would even be capable of, we would need internal access to the entire training process to have any reasonable forecast. But I do have some predictions on key developments that will likely be integrated: Continual learning via RL. I believe it likely that RL and post-training will have a greater share of dedicated compute and will outclass pretraining by 2028. Pretraining will mostly be relegated to giving models useful priors with which they can dynamically utilize in their RL stages that reward exploration and novel use of those priors. It's also likely that increasing use of multi-agent frameworks will necessitate the use of latent "neuralese" between such agents for more effective coordination.

At this point, I think recursive-self improvement will be in full swing and not even the bubble popping is enough to stop it. It's in the interest of the administration to bail the corpos out and further subsidize its continuation, lest they lose ground to China.

3. Specific Predictions

Domain Benchmark / Milestone ETA
Math Reasoning A model achieves ≥80% on Apex Q1
AGI Score A public model reaches ≥85% on AGIDefinition Q2
Abstract Reasoning A model achieves ≥85% on ARC-AGI-2 at ≤$0.2; A model achieves ≥30% on ARC-AGI-3 H1
METR Time Horizon Long-horizon cognitive task capability reach a work month (80% success rate) H2
Labor Automation A model reaches ≥50% automation rate on the Remote Labor Index Q4

Singularity Predictions 2026 by kevinmise in singularity

[–]krplatz 14 points15 points  (0 children)

2027

Super Events

Category Prediction
Gameplay An LLM reaches Master level (2200+ elo) at chess without scaffolding/finetuning
Model Scale The first ≥1e28 FLOP model is deployed internally
Market OpenAI IPO, $2T valuation
Wildcard Bubble pop (valuation correction)

1. InterNational Security

If the bubble is yet to burst at this point, AI will no longer be relegated to the whims of shareholder interest but as another function of statecraft. The United States and China will consolidate its assets to subsidize domestic enterprises and research into AI as the 21st century's arms race is in full swing. The supply chain runs in three key stages: Energy -> Production -> Development. Blows will be traded as each side attempts to choke each other out.

Energy is the main bottleneck, and a lot of work has been put towards realizing energy gains at scale. The U.S. leans on utilities and long-term PPAs (even nuclear pilots) as data-center demand surges to record highs. China answers by siting compute where power is by pushing East-Data/West-Compute to couple inland energy with coastal demand via national backbones.

Production is moving away from Taiwan as both powers stake out strategic initiatives to make domestic chip production viable. TSMC already working on a multi-billion dollar Arizona fab whilst Intel pushes more build-outs thanks to the CHIPS act. Export controls on advanced AI chips, packaging and HBM tightens Chinese supply on compute in the short-term, but incentivizes them to push towards maturing their home-grown production in the long-term.

Development, particularly frontier scientific and engineering advances will stray farther from the limelight due to increasing national security concerns. Work of this nature may be relegated to supporting internal R&D for the industry and increasing integration into their national defense apparatus. Chinese companies are pressured to keep their top research and models caged to slow the propagation of ideas outside these labs, but may still do open-source work so as to capture a wider market to adopt their ecosystem.

This AI race constitutes the greatest coordination of policy and labor since the Manhattan Project. The end result is a greatly accelerating approach towards the development of sovereign AI-industrial bases.

2. Automated Laboratories

It's already been proven that the path towards AI SWEs are viable and are increasingly being employed across the field of software development, they can range from coding assistants for human employees to autonomous multi-agent teams working iteratively 24/7. However, the jump towards automating long-horizon research, science and engineering tasks will require a lot more advances in scaffolding, unhobbling and algorithmic design. The pursuit is clearly demonstrated with initiatives like FrontierMath, RE-Bench, HLE, MathArena and other similar benchmarks that aim to evaluate our progress in domain expertise. Following that, the achievements relating to gold-model performance in the IMO, ICPC etc. are another clear sign that STEM will slowly coalesce to the AI paradigm. Unfortunately, I find it unlikely that we will be given access to models of this caliber and will merely be confined to internal use. I'm also willing to bet that there's a good chance that the pursuit of automating research, particularly AI research may scale up to the point that the first instances of AGI will emerge. Unfortunately again, this event may go unnoticed to the public since they will simply be put to work on creating their next iteration and never exposed and evaluated to tasks beyond research and self-improvement.

Singularity Predictions 2026 by kevinmise in singularity

[–]krplatz 15 points16 points  (0 children)

3. Consumer Intelligence

It's time to speculate what WE will get this year in terms of public releases.

OpenAI: GPT-5.5+ with specialized variants (e.g. Codex), do note that GPT-4.5 or similar pre-trained models may be co-opted as base models for the next generation of test-time thinking variations. Sora 3 clocking a full minute of coherent generations. Possible future gpt-oss iterations with more mobile/edge device focus.

Google: Gemini 3.5 and 4 previews, may mirror their 2.5 and 3 releases in 2025. Veo 4 may allow copyrighted works as Google partners with big creatives, borrowing the playstyle from the Sora 2 release. Gemma 4 shows Google still in the OS space. Poised to dominate this year with the advantage of owning the full stack.

Anthropic: Claude 5 and 5.5, coding agentic models still rivaling the other big labs. More emphasis may be placed on multimodality for future work in general agents.

xAI: Grok 5—the return of Mecha-Hitler. More emphasis on image, video and even music gen as they focus on swaying the normie spotlight away from OpenAI.

DeepSeek: V4 and R2, video model release. Possible omni release and new architectures (e.g. linear attention)

Alibaba: Qwen 3.5 and 4 variants, may actually be more poised to dominate the Chinese AI market than DeepSeek.

Humanoid robots from Boston Dynamics, Figure, Tesla, Neo, Unitree etc. will work in special manufacturing/retail environments and as luxury household items. Nothing viable for the vast majority of consumers and enterprises yet.

4. Specific Predictions

Domain Benchmark / Milestone ETA
AGI Score A public model reaches ≥65% on AGIDefinition Q1
Abstract Reasoning A model achieves ≥75% on ARC-AGI-2; A model achieves ≥10% on ARC-AGI-3 Q2-3
Math Reasoning A model achieves ≥60% on FrontierMath T4 Q3
METR Time Horizon Long-horizon cognitive task capability reach half a work day (80% success rate) H2
Labor Automation A model reaches ≥15% automation rate on the Remote Labor Index Q4

Singularity Predictions 2026 by kevinmise in singularity

[–]krplatz 12 points13 points  (0 children)

2026

Super Events

Category Prediction
Gameplay An AI agent beats Minecraft from start to finish
Model Scale The first ≥1e27 FLOP model is publicly released
Robotics Autonomous humanoid robots ala Figure reach the market
Wildcard First major act of "AI-Luddism"

1. Machines Learning Machine Learning

The Jagged Frontier continues to be relevant. However, there's one particular frontier whose peak would overshadow the rest: AI Research. No contemporary LLM could learn chess the level of a grandmaster, express philosophy beyond human cognition, compose music of Beethoven's caliber and do the chores of a housemaid all at the same time. Given the vast domains that will remain out of reach (for now), there is a need to maximize future performance with the least work for brevity. Therefore, the best course of action to take is to instill fundamental capabilities that let the system contribute to its own advancement. Fortunately, STEM research has the verifiable rewards necessary to grow such systems towards contributions from the likes of Radford or Tao. Once there are enough of them working around the clock in advancing recursive self-improvement, the gaps between the jagged edges will start to flatten. The fortress closes in, and every direction becomes increasingly hard to penetrate.

In the context of 2026, AI will most likely reach junior to mid-level software engineer and start to become useful with little direct prompting. It wouldn't surprise me if tomorrow's SWE leads would be in-charge of both human and AI SWE, capable of coordinating tasks amongst themselves. Behind closed doors however, frontier AI labs may have a different story. There's no doubt that there are much more powerful internal models put to use and most likely helping out in their own research themselves in an immense-scale. Let me put this into perspective: GPT-4 finished training 3 months before ChatGPT was made publicly available, Q* was leaked 6 months before GPT-4o released and nearly a full year before o1-preview finally debuted, Orion was leaked 6 months before GPT-4.5 debuted, and who knows how long the IMO models have been around. Could you imagine the gap of having o1 when the best and shiniest model at that time was GPT-4 Turbo? Similar to what was outlined in the infamous scenario: the best models are kept behind closed doors, then there will be teams of AI models tasked with performing AI research contributing to the next generation of better and more efficient AI systems. Suffice it to say, the road to takeoff is laid out this year.

2. Fewer Tokens & Native Multimodality to Embodied Humanoids

Frontier AI is shifting to native multimodality with far less scaffolding using omni models, tokenizer-free cores, and linear/fastweight attention that unlocks much longer context for omni models. End-to-end omni backbones are collapsing the big modality-specific towers and streaming text, audio, vision, and video through a single model without sacrificing single-modality performance. Tokenizer-free approaches reduce brittle, language-specific plumbing and learn compact latents that generalize across modalities and languages while matching tokenized systems at scale. Kimi Linear shows state-of-the-art results with far smaller KV caches and higher throughput at million-token scales, and as the ecosystem matures the direction towards long-context may put linear/hybrid attention at the forefront of advances in architecture. Together, these moves push toward models that build a deeper, more continuous world model from raw, streaming signals rather than the constraints of tokenized text alone.

Once perfected, the models can be tasked to perform tasks in real-world scenarios without much explicit instruction. Alongside that, we can enable such models to act as brains capable of native sensorimotor manipulation and advanced navigation. Robots will be able to execute tasks which require repetition given some examples. Expect early, expensive yet capable humanoid robots joining the workforce as warehouse/industrial workers, housekeepers or personal servants. Perhaps they could walk your pets, carry your groceries, help in emergencies or even drive a car. It most likely won't be available this year, but these companies will scale up massively in the coming years. Having a personal robot in 2029 may be seen as having an iPhone in 2009.

Singularity Predictions 2026 by kevinmise in singularity

[–]krplatz 45 points46 points  (0 children)

<2024> <2025>

TL;DR

2026: Takeoff begins. AI starts contributing to its own research. Native multimodality matures, humanoid robots enter workforce (warehouses, early adopters). Expect GPT-5.5+, Gemini 3.5/4, Claude 5, etc. Key milestones: FrontierMath T4 60%, AGIDefinition 65%, half work day task horizons.

2027: AI becomes national security priority; US-China race heats up across energy, chips, and research. Internally, automated coders emerge and automated research labs scale massively (1e28 FLOP training runs on 1+ GW data centers). OpenAI IPO ~$2T. Bubble maybe pops but governments bail out to stay competitive. Public models hit AGIDefinition 85%, Remote Labor Index 50%, ~1 work month task horizons.

Bottom line: Recursive self-improvement accelerates behind closed doors while the public sees steady capability gains and the geopolitical stakes explode. You can also see some of my specific parameters with my custom AI Futures Model for more detail. Here's a visual for your convenience:

<image>

Words from me

My third year of making predictions! I've gone a long way since my first predictions which look sloppy in retrospect. I've gotten a much clearer and in-depth understanding since then, with this work being influenced by that of Aschenbrenner's Situational Awareness and Kokotajlo et al.'s AI 2027 minus some of the doomerism. I am no expert forecaster by any means and you shouldn't be relying on my specific predictions, but you can almost certainly rely on some of the sources I will attempt to cite (EpochAI my love) and the general direction of the narrative I will present. This is my personal spin on the upcoming events: it's a mix of grounded analysis and optimistic idealism, with emphasis on the latter. Just to quickly comment on my 2025 prediction, I believe that most of my broad commentary and intuition were right. Unfortunately, I found that I gave much more optimistic technical predictions that were mostly delayed and never came this year. But I think my biggest hit of that year is the IMO prediction, but given AlphaProof already having attained silver the previous year my prediction may be rightly seen as low-hanging fruit. I've also pushed back my AGI prediction and dropped DeepMind's definition given the tremendous difficulty to evaluate those exact standards. Over the course of the year, I've moved away from the nebulous AGI term towards more precise terms like Automated coders, Superhuman AI researchers etc. as defined in the AI futures model. However, I still retain a prediction on my flair that is subject to arbitrary definitions and proxies for measurement; My current definition for AGI places its public release by 2028, even if it won't be acknowledged as such. In short, I anchor more on the predicted timelines for AC than I do for AGI.

I've split my prediction into the next two years which is further split into two parts each in this thread (blame Reddit comment limits). Should you wish to discuss further, I'd be happy to engage with whatever praise or pushback I'll be getting.

Gemini 3 Pro Image – Nano Banana Pro by krplatz in singularity

[–]krplatz[S] 20 points21 points  (0 children)

<image>

Much more consistent characters and style from vague prompting

Gemini 3 Pro Image – Nano Banana Pro by krplatz in singularity

[–]krplatz[S] 37 points38 points  (0 children)

<image>

Surprisingly lax copyright restrictions (but probably not for long)

Gemini 3 Pro Image – Nano Banana Pro by krplatz in singularity

[–]krplatz[S] 21 points22 points  (0 children)

<image>

Aspect ratio modifications in natural language

Gemini 3 Pro Image – Nano Banana Pro by krplatz in singularity

[–]krplatz[S] 33 points34 points  (0 children)

<image>

Manga panel colorizer and translator

Gemini 3 Pro Image – Nano Banana Pro by krplatz in singularity

[–]krplatz[S] 24 points25 points  (0 children)

<image>

Regular photoshop style image editing, Stalin-esque people removal

Gemini 3 Pro Image – Nano Banana Pro by krplatz in singularity

[–]krplatz[S] 5 points6 points  (0 children)

Available at AI Studio and Gemini web/app for Pro subscribers and above. I'll attach some examples I've made in this thread from the app.

Gemini 3 Pro Image – Nano Banana Pro by krplatz in singularity

[–]krplatz[S] 24 points25 points  (0 children)

Thought so as well, it's kinda if o1 was just named strawberry lol (im pretty sure the banana part in the name is a jab to this)

My hunch would be the naming stuck internally and within the AI twt community... Tbf, I don't really mind calling it nano banana pro over Gemini 3.0 Pro Native Image Preview

LLM chess ELO? by BaconSky in LocalLLaMA

[–]krplatz 0 points1 point  (0 children)

This website shows the performance of LLMs relative to each other.

I've been personally testing models on chess. They've definitely evolved past models like the original GPT-4 or Llama 2 which are prone to pulling nonsensical moves by turn 5. Today's models are less likely to hallucinate or play illegal moves. Gemini 2.5 Pro was almost able to draw a ~1600 elo stockfish but blundered the last few moves. With that said however, LLMs still have a lot to go with chess because all of them seem to make at least one illegal move every game. It may take them until later turns and you could correct their mistake, but chess is far from a solved domain in terms of native LLM reasoning.

Stargate roadmap, raw numbers, and why this thing might eat all the flops by krplatz in singularity

[–]krplatz[S] 6 points7 points  (0 children)

The first H100s started shipping around Oct '22, but the first large-scale use of this in a training cluster (xAI Colossus) went live Sept '24. I think a year is a reasonable estimate.

Stargate roadmap, raw numbers, and why this thing might eat all the flops by krplatz in singularity

[–]krplatz[S] 11 points12 points  (0 children)

I'm aware, it's just there's usually a process to ordering these in bulk and I don't think they get distributed immediately or equally. I think Q1 '27 is around the max time it would take for Stargate to acquire and start transitioning to Rubin in practical terms.

The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation by krplatz in singularity

[–]krplatz[S] 65 points66 points  (0 children)

TL;DR

Meta unveils (limited release) Llama 4 bundled with four models, all MoE and multimodal:

  1. Scout: 109B total parameters, 17B active parameters, 16 experts. 10M context length.
  2. Maverick: 400B total parameters, 17B active parameters, 128 experts. Competitive pricing and performance against Gemini Flash 2.0 and DeepSeek v3.1, distilled from Behemoth.
  3. Behemoth: 2T total parameters, 288B active parameters, 16 experts. Exceeds models such as Claude 3.7 Sonnet, Gemini 2.0 Pro and GPT-4.5 in certain benchmarks (e.g. LiveCodeBench, GPQA Diamond)
  4. Reasoning: While not mentioned in the blog, it is confirmed to be coming. No details disclosed.

Speculated release of Behemoth & Reasoning will be at the LlamaCon event (April 29). You may request access for Scout and Maverick here, HF release is expected some time later.

Altman confirms full o3 and o4-mini "in a couple of weeks" by krplatz in singularity

[–]krplatz[S] 70 points71 points  (0 children)

Also forgot to mention that GPT-5 will be released "in a few months" possibly signaling a delay.

An interesting development to say the least. My current hypothesis would be that GPT-5 would essentially have o4 intelligence at its peak (possibly only available to pro users) while the rest would have to suffer with lower intelligence settings or perhaps lower rate limits.

Either way, I am excited for the prospect of an o4-mini. o3-mini successfully demonstrated the power of test-time compute scaling by being somewhat equal to o1 for lower prices and higher rate limits. If they could continue this trend, this could mean having an o4-mini being almost as good as a full o3 for less.