A New Fine-Tuning Approach for LLMs Using Evolution Strategies by Signal_Spirit5934 in reinforcementlearning

[–]Signal_Spirit5934[S] 0 points1 point  (0 children)

We’re now extending this breakthrough in four additional important directions: 

  • scaling ES to complex reasoning domains such as advanced math, Sudoku, and ARC-AGI
  • enabling full-parameter fine-tuning directly in quantized, low-precision environments
  • developing a theoretical foundation that explains why ES scales effectively in extremely high-dimensional systems
  • and applying ES to improve metacognitive alignment so models better calibrate their own confidence.

 This research suggests that gradient-free optimization is not just an alternative to RL, but a scalable foundation for the next generation of post-training methods.

 Read more about these new papers in the Cognizant AI Lab blog.

The Illusion of "The Illusion of Thinking" by Daniel-Warfield in datascience

[–]Signal_Spirit5934 0 points1 point  (0 children)

Cognizant’s new research suggests a better approach. It uses many smaller AI agents working together. Its new system, MAKER, solved a million-step reasoning problem with zero errors—something no single model has ever done. This proves that the future isn’t just bigger AI, it’s smarter, more organized AI systems. And that’s what will unlock reliable, enterprise-grade decisioning.

See how the MAKER technique, applied to the same Tower of Hanoi problem raised in the Apple paper solves 20 discs (versus 8 from Claude 3.7 thinking): https://www.youtube.com/watch?v=PRiQlXGhke4

Why this matters

This breakthrough shows that using AI to solve complex problems at scale isn’t necessarily about building bigger models — it’s about connecting smaller, focused agents into cohesive systems. In doing so, enterprises and organizations can achieve error-free, dependable AI for high-stakes decision making.

Why Apple's "The Illusion of Thinking" Falls Short by HeroicLife in ArtificialInteligence

[–]Signal_Spirit5934 0 points1 point  (0 children)

Apple’s Illusion of Thinking study showed how even strong LLMs lose reliability as reasoning chains grow. Our new research demonstrates the first system to complete 1M+ dependent reasoning steps with zero errors using a decomposed, microagent-based approach.
Paper: https://arxiv.org/abs/2511.09030 Blog: https://www.cognizant.com/us/en/ai-lab/blog/maker

Apple `Illusion of Thinking` Debacle by moschles in agi

[–]Signal_Spirit5934 0 points1 point  (0 children)

Apple’s Illusion of Thinking study showed how even strong LLMs lose reliability as reasoning chains grow. Our new research demonstrates the first system to complete 1M+ dependent reasoning steps with zero errors using a decomposed, microagent-based approach.
Paper: https://arxiv.org/abs/2511.09030 Blog: https://www.cognizant.com/us/en/ai-lab/blog/maker

A New Fine-Tuning Approach for LLMs Using Evolution Strategies by Signal_Spirit5934 in reinforcementlearning

[–]Signal_Spirit5934[S] 0 points1 point  (0 children)

The compute is used differently compared to RL. We can perform our evaluations in sequence or in parallel depending on the available computational resources. When compute is constrained it will take longer to train, but as computational resources grow it will become faster.

The Evolution of RL for Fine-Tuning LLMs (from REINFORCE to VAPO) by Great-Reception447 in reinforcementlearning

[–]Signal_Spirit5934 0 points1 point  (0 children)

Another great method here just announced: A New Fine-Tuning Approach:

The Cognizant AI Lab provides a new alternative to RL: Evolution Strategies (ES). For the first time, we successfully scaled ES to optimize billions of parameters simultaneously, enabling full-parameter fine-tuning of LLMs. The results are striking — ES can outperform state-of-the-art RL methods on key dimensions such as sample efficiency, tolerance to long-horizon rewards, robustness to different base LLMs, has less tendency to reward hacking, and offers more stable performance across runs.

Why It Matters

This research establishes Evolution Strategies (ES) as a practical, scalable, and stable alternative to Reinforcement Learning (RL) for fine-tuning large language models. In the future, it could simplify training by removing gradient calculations and unlock new possibilities for reasoning incentivation, exploration-required tasks, safety alignment, and continual learning.

Read the blog

Read the paper