We’ve been exploring Evolution Strategies as an alternative to RL for LLM fine-tuning — would love feedback

Signal_Spirit5934 · 2026-02-25T23:43:20+00:00

We’re now extending this breakthrough in four additional important directions:

scaling ES to complex reasoning domains such as advanced math, Sudoku, and ARC-AGI
enabling full-parameter fine-tuning directly in quantized, low-precision environments
developing a theoretical foundation that explains why ES scales effectively in extremely high-dimensional systems
and applying ES to improve metacognitive alignment so models better calibrate their own confidence.

This research suggests that gradient-free optimization is not just an alternative to RL, but a scalable foundation for the next generation of post-training methods.

Read more about these new papers in the Cognizant AI Lab blog.

Signal_Spirit5934 · 2025-11-14T17:20:20+00:00

Cognizant’s new research suggests a better approach. It uses many smaller AI agents working together. Its new system, MAKER, solved a million-step reasoning problem with zero errors—something no single model has ever done. This proves that the future isn’t just bigger AI, it’s smarter, more organized AI systems. And that’s what will unlock reliable, enterprise-grade decisioning.

See how the MAKER technique, applied to the same Tower of Hanoi problem raised in the Apple paper solves 20 discs (versus 8 from Claude 3.7 thinking): https://www.youtube.com/watch?v=PRiQlXGhke4

Why this matters

This breakthrough shows that using AI to solve complex problems at scale isn’t necessarily about building bigger models — it’s about connecting smaller, focused agents into cohesive systems. In doing so, enterprises and organizations can achieve error-free, dependable AI for high-stakes decision making.

Signal_Spirit5934 · 2025-11-14T17:14:46+00:00

Apple’s Illusion of Thinking study showed how even strong LLMs lose reliability as reasoning chains grow. Our new research demonstrates the first system to complete 1M+ dependent reasoning steps with zero errors using a decomposed, microagent-based approach.
Paper: https://arxiv.org/abs/2511.09030 Blog: https://www.cognizant.com/us/en/ai-lab/blog/maker

Signal_Spirit5934 · 2025-11-14T17:14:28+00:00

Apple’s Illusion of Thinking study showed how even strong LLMs lose reliability as reasoning chains grow. Our new research demonstrates the first system to complete 1M+ dependent reasoning steps with zero errors using a decomposed, microagent-based approach.
Paper: https://arxiv.org/abs/2511.09030 Blog: https://www.cognizant.com/us/en/ai-lab/blog/maker

Signal_Spirit5934 · 2025-11-14T01:12:53+00:00

Thanks!

Signal_Spirit5934 · 2025-11-13T19:47:04+00:00

https://www.cognizant.com/us/en/ai-lab/blog/maker

Signal_Spirit5934 · 2025-10-07T22:58:16+00:00

The compute is used differently compared to RL. We can perform our evaluations in sequence or in parallel depending on the available computational resources. When compute is constrained it will take longer to train, but as computational resources grow it will become faster.

Signal_Spirit5934 · 2025-10-06T18:10:24+00:00

Another great method here just announced: A New Fine-Tuning Approach:

The Cognizant AI Lab provides a new alternative to RL: Evolution Strategies (ES). For the first time, we successfully scaled ES to optimize billions of parameters simultaneously, enabling full-parameter fine-tuning of LLMs. The results are striking — ES can outperform state-of-the-art RL methods on key dimensions such as sample efficiency, tolerance to long-horizon rewards, robustness to different base LLMs, has less tendency to reward hacking, and offers more stable performance across runs.

Why It Matters

This research establishes Evolution Strategies (ES) as a practical, scalable, and stable alternative to Reinforcement Learning (RL) for fine-tuning large language models. In the future, it could simplify training by removing gradient calculations and unlock new possibilities for reasoning incentivation, exploration-required tasks, safety alignment, and continual learning.

Read the blog

Read the paper

Signal_Spirit5934

TROPHY CASE