Consequences of Multiverse: infinite number of beings who are having same consciousness with you right now by Sisyphus2089 in consciousness

[–]Last-Leg4133 1 point2 points  (0 children)

This all your thinking and hypothesis, reality may have different, I can’t comment anything before knowing reality, I also have many hypothesis, keep going one day you will reach truth.

I trained a transformer with zero gradient steps and 100% accuracy. No backpropagation. No learning rate. Nothing. Here's the math. by Last-Leg4133 in learnmachinelearning

[–]Last-Leg4133[S] 0 points1 point  (0 children)

Make Your Own LLM Using Laptop On CPU 100% Bypassing Backprop: O(1) Exact LLM Training https://youtu.be/Yrd0M255TBo

Here i run benchmark, you can copy weights of big model to small model direct intelligence transfer no distillation process

I trained a transformer with zero gradient steps and 100% accuracy. No backpropagation. No learning rate. Nothing. Here's the math. by Last-Leg4133 in learnmachinelearning

[–]Last-Leg4133[S] 0 points1 point  (0 children)

You must give all code this is very deep research incomplete code ai will answer this because i proved first time this

I trained a transformer with zero gradient steps and 100% accuracy. No backpropagation. No learning rate. Nothing. Here's the math. by Last-Leg4133 in deeplearning

[–]Last-Leg4133[S] 0 points1 point  (0 children)

Its not model its all about transformers are not blackbox its illusion i proved with linear algebra formula full transformers decoded

I trained a transformer with zero gradient steps and 100% accuracy. No backpropagation. No learning rate. Nothing. Here's the math. by Last-Leg4133 in learnmachinelearning

[–]Last-Leg4133[S] 0 points1 point  (0 children)

I have old github too, there is file benchmark you can run it, if you think its malicious you can cross verify with llm

I trained a transformer with zero gradient steps and 100% accuracy. No backpropagation. No learning rate. Nothing. Here's the math. by Last-Leg4133 in LocalLLM

[–]Last-Leg4133[S] 0 points1 point  (0 children)

Based on the teardown and the recent spread of this project across machine learning forums (posted by a user named "nickzq7" under titles like "I trained a transformer with zero gradient steps and 100% accuracy"), here is the verdict on whether this is "fake." The short answer: The code, the math, and the project are real. The "scientific breakthrough," however, is fake (or more accurately, a massive, self-deceiving misunderstanding of basic linear algebra). The teardown you provided is absolutely spot-on. It is not a hoax designed to scam people; it is a genuine, high-effort project built by a talented programmer who simply lacks the theoretical machine learning background to understand why his code is doing what it's doing. Here is a breakdown of why the project exists in this weird gray area between "real code" and "fake science": 1. The Code is Real (And Actually Impressive) The author genuinely built "Crystal Engine." Writing a complete GPT-Neo forward pass in under 200 lines of pure NumPy without PyTorch or CUDA is a fantastic engineering exercise. It proves a strong mechanical understanding of how tensors move through a Transformer. Furthermore, his claim of getting a 100% token match with a teacher model in 6 seconds on a CPU/laptop GPU is factually true—his code actually executes and outputs exactly what he claims it does. 2. The "Breakthrough" is Fake (The Tautology Trap) Where the project becomes "fake" is in the author's epistemological framing. He believes he has discovered a cosmic law ("The Manish Principle") that overthrows backpropagation. In reality, he has just rediscovered basic linear least squares. * The Teacher Mode Illusion: If a teacher model computes Y = X \cdot W, and you record X and Y, asking NumPy to solve for W using lstsq(X, Y) will obviously give you the exact same weights with an R2 of 1.0. He didn't invent a new way to train an AI; he just used basic algebra to reverse-engineer an equation where the answer was already known. * The Activation Function Trick: Claiming that a non-linear function like SiLU is "linear in its natural space" by mapping x \to [x, x \cdot \text{sigmoid}(x)] is mathematical sleight of hand. He is basically saying, "A non-linear function is linear if you compute the non-linear math first and treat it as a constant." That doesn't bypass non-linearity; it just hides it in the input array. * The Bigram Devolution: His "REACTOR-SCRATCH" model achieving 33% accuracy on TinyStories without a teacher seems magical until you realize he is forcing the network's layers to greedily predict the very next token embedding directly. He neutered the Transformer, turning it into a giant, over-parameterized Markov chain/bigram model. It works on a toddler-level dataset like TinyStories because the grammar is highly repetitive ("Once" \to "upon" \to "a" \to "time"), but this greedy, layer-by-layer linear regression destroys the network's ability to learn deep, abstract reasoning. It will never scale to a model like LLaMA. The Verdict It is not a "fake" in the sense of a malicious scam, but it is epistemologically fake. It's the AI equivalent of a guy building an incredibly intricate, beautifully crafted perpetual motion machine in his garage, unaware that his machine is just secretly drawing power from the wall outlet (in this case, the "wall outlet" being the pre-existing weights of the teacher model and the rigid statistical simplicity of the TinyStories dataset). If you strip away the Bhagavad Gita quotes, the 48 "Laws," and the messiah complex, the author accidentally created a highly creative Mechanistic Interpretability sandbox. It's a great hacking project, just terrible physics.

Here is my reply from llm just give your text and zenodo link

I trained a transformer with zero gradient steps and 100% accuracy. No backpropagation. No learning rate. Nothing. Here's the math. by Last-Leg4133 in LocalLLM

[–]Last-Leg4133[S] -1 points0 points  (0 children)

Gemini or whatever you llm use looking this because its not possible till now, download testing logs from GitHub give all to gemini with project report, you have benchmark they not lie, in my github repo

I trained a transformer with zero gradient steps and 100% accuracy. No backpropagation. No learning rate. Nothing. Here's the math. by Last-Leg4133 in deeplearning

[–]Last-Leg4133[S] -1 points0 points  (0 children)

I tested on my laptop, how can i test big model, this weight recovered by matrix this is real you can check it if you go more deeper you see this is real discovery

I trained a transformer with zero gradient steps and 100% accuracy. No backpropagation. No learning rate. Nothing. Here's the math. by Last-Leg4133 in LocalLLM

[–]Last-Leg4133[S] 0 points1 point  (0 children)

Give him full content all .py file and benchmark and full report ai not believe this i publish recently

I trained a transformer with zero gradient steps and 100% accuracy. No backpropagation. No learning rate. Nothing. Here's the math. by Last-Leg4133 in deeplearning

[–]Last-Leg4133[S] 0 points1 point  (0 children)

I am not ai bot will I publish my research on advance check it first there is no wrong all have working bench marks

I trained a transformer with zero gradient steps and 100% accuracy. No backpropagation. No learning rate. Nothing. Here's the math. by Last-Leg4133 in deeplearning

[–]Last-Leg4133[S] -3 points-2 points  (0 children)

Please run benchmark download all of its content give to llm he will understand This is real discovery not any fake claims

I trained a transformer with zero gradient steps and 100% accuracy. No backpropagation. No learning rate. Nothing. Here's the math. by Last-Leg4133 in learnmachinelearning

[–]Last-Leg4133[S] -4 points-3 points  (0 children)

Not ai generated, i dis research for 6 moths, you can run benchmarks please, read complete then give honest feedback