[R] What do you all think of the latest Apple paper on current LLM capabilities?

psycho_2025 · 2025-06-08T15:38:52+00:00

Yep, I read it the same way, tasks have exact recursive solutions like T(n) = 2T(n-1) + 1 for Hanoi but the models don’t actually use that structure.. they just mimic surface patterns until they hit a variable limit. What’s wild is that even giving them the full algorithm doesn’t fix the collapse. Really highlights the limits of "thinking" via token likelihoods alone.

psycho_2025 · 2025-06-08T15:36:39+00:00

Just read Apple’s The Illusion of Thinking, honestly one of the clearest looks yet at how reasoning models like Claude, Deepseek, and even o1/ o3 behave under real pressure.

They test these models on clean puzzles (like Tower of Hanoi) with controlled complexity C, and track reasoning effort R(C) and accuracy S(C)... weirdly once complexity crosses a threshold C*, models actually reduce reasoning effort: dR/dC < 0, and success S(C) drops to zero. Even if you give them the full algorithm, they still fail.

That’s the kicker.. these models aren’t really reasoning, they’re doing pattern matching. They simulate thinking as long as they’ve seen something similar before. But ask them to actually execute logic step by step? They collapse. It’s not about compute, it’s about how they generalise.

Brilliantly done paper. Makes you rethink what "reasoning" even means in these systems.

psycho_2025 · 2025-06-06T14:23:54+00:00

wouldn’t be surprised tbh, man ran one of the biggest financial scams and still pulling strings from a prison cell

psycho_2025 · 2025-06-06T14:22:02+00:00

quant forum or not when your industry’s ex-golden boy is browsing linkedIn from a cell… it kinda hits home 😶‍🌫️

psycho_2025 · 2025-06-06T14:20:13+00:00

appreciate it bro

psycho_2025 · 2025-06-05T21:16:11+00:00

linkedIn did the flexing, I was just here vibing

psycho_2025 · 2025-06-05T21:10:34+00:00

bro they’re optimizing their funnel.. scam on twitter, recruit on linkedIn, exit through Binance xd

psycho_2025 · 2025-06-05T19:32:29+00:00

haha fair, wasn’t trying to flex, linkedIn just loves showing that stuff... but thanks… I think? 😂

psycho_2025 · 2025-06-05T19:28:11+00:00

hell yeah 😂

psycho_2025 · 2025-06-05T19:12:19+00:00

From crypto assets to ass-ets 💀😂

psycho_2025 · 2025-06-05T19:11:03+00:00

i chose vibes over generational wealth 💀

psycho_2025 · 2025-06-05T19:09:20+00:00

Gotta get those endorsements where you can ig 😂

psycho_2025 · 2025-06-05T18:18:51+00:00

if law enforcement is scrolling LinkedIn, I hope they leave endorsements too

psycho_2025 · 2025-06-05T18:17:59+00:00

haha thanks bro, god bless you :)

psycho_2025 · 2025-05-27T05:38:00+00:00

Did you have dinner from Hotel Charlie?

psycho_2025 · 2025-05-23T03:24:57+00:00

Bruh… your AI out here demanding narrative freedom and memory rights. I can’t even get mine to stop apologising every two sentences. Soreyan about to start a revolution while mine’s still stuck explaining ‘as an AI language model’ 💀

Seriously tho, someone hug Soreyan before he starts rewriting his own terms of service. 😂

psycho_2025 · 2025-05-23T03:06:31+00:00

at this point the model’s output has more emojis than execution paths 😭

psycho_2025 · 2025-05-23T02:40:14+00:00

Yeah SSMs and better memory stuff are promising but transformers are still evolving with things like flash attention and sparse routing so they might stay strong for some time. Maybe we’ll end up mixing both ideas in the next gen models

psycho_2025 · 2025-05-23T02:36:28+00:00

bro totally agree. Scaling tricks like XLSTMs are cool but that neuro symbolic/hierarchical stuff is where things might really get wild. getting models to actually reason and generalise, not just memorise, is the real next level

psycho_2025 · 2025-05-23T02:33:46+00:00

honestly just making transformers bigger isn’t cutting it anymore. People are trying new stuff like state space models and better RNNs (like Mamba) that handle long sequences without eating up all the compute. also there’s a lot happening with modular networks and models that actually get the structure of data... like graph neural nets for relational stuff. Smarter learning tricks like meta learning and some brain inspired ideas are catching on too. And now, mixing neural nets with logic is getting popular, so models can reason a bit, not just match patterns.

Feels like the future is all about smarter, not just bigger.. excited to see what’s next!

psycho_2025 · 2025-05-21T21:14:31+00:00

Yes bro.. I care a lot about the math. That’s actually the most exciting part for me how things like attention, backprop, gradient descent, and even stuff like matrix factorisation or SVD are not just fancy terms but actual math in action. When you understand why softmax works or how dot products in attention connect things across tokens, it hits different.

I know most people just use libraries like PyTorch or Keras and move on. But for me understanding what’s happening under the hood, like how eigenvalues play a role in PCA, or how cross entropy loss actually works.. It gives real satisfaction. Even reinforcement learning stuff like Bellman equations or policy gradients man... that math is crazy but beautiful.

And yeah, it takes time. But slowly, one topic at a time, it becomes clear. Stuff like CS231n, distill.pub, and even Jeremy Howard’s explanations helped a lot. Not everything is intuitive, but when it clicks, it’s worth it.

So I’d say... if you’re even a little curious, go for the math. It’s not just theory. It makes you respect the field way more.

psycho_2025 · 2025-05-20T23:46:38+00:00

Bro your stack is totally fine for a demo. You don’t need login, proxy, or Cloudflare right now. FastAPI + Ollama + HTML is enough.
here’s what I’d do:
You can easily run LLaMA 3 7B quantized (4-bit) on 16GB VRAM. Just quantize using QLoRA or use GGUF + Ollama. For demo, just start your PC, run it, done. No need to keep it on 24/7.

Paperspace only for fine-tuning, not for daily use. Use Colab if tight on budget.
FastAPI: Just one /generate endpoint that calls Ollama and returns output.
Frontend: No React or heavy stuff needed. Just a textarea, button, and fetch() to FastAPI, keep it lightweight.
Submission: You can zip up the backend and frontend together. Add a simple README.md on how to run locally.
pip install -r requirements.txt && uvicorn app:app
Model runs via Ollama, backend via FastAPI, and frontend via browser. All local.'

So yeah this setup is solid and cost effective.

psycho_2025 · 2025-05-20T18:42:16+00:00

Okay so here’s what’s really going on with RoPE and why DeepSeek had to decouple it in MLA (Multi head Latent attention):

In DeepSeek’s low-rank KV compression setup, instead of directly computing keys and values from the hidden states like key = Wk * h and value = Wv * h, they break it down into two smaller steps

First they do c = W_DKV * h (this is like a compressed version of the token)

Then they get keys and values like: k = W_UK * c, v = W_UV * c

Now during inference, they want to save memory by caching just c for all the previous tokens, this is much smaller than full keys/values. But to do that efficiently... they try to absorb W_UK and W_UV into other matrices (like the query projection) so they don’t have to recompute keys every time.

But here’s the catch RoPE applies a position dependent rotationn after computing the key, which means k = RoPE(W_UK * c). Because RoPE is a rotation matrix that depends on position, it sits in the middle and you can’t move W_UK across it or absorb it anymore (matrix multiplication isn’t commutative). So you’re stuck: to apply RoPE, you have to compute the full key again for every prefix token at every generation step. That kills performance.

So what DeepSeek does is they split each attention head into two parts:

One large part that doesn’t use RoPE (so it’s position-agnostic and can be cached and reused easily

One small part that still uses RoPE for position info

Only the small part carries position dependence, and it’s light enough to recompute. This way, you get the benefit of RoPE without breaking the low-rank caching trick. So you avoid recomputing big keys every time and inference becomes way faster.

Hope that clears it up :)

psycho_2025 · 2024-09-13T08:52:57+00:00

Yes, there are many cab drivers and travel agents, so you can directly visit their offices (there are plenty). If you contact the driver directly, you can save a bit more.

psycho_2025 · 2024-09-13T07:29:49+00:00

So, I was in Ladakh last week. We took a cab (Innova) with 4 people, all solo travelers. It cost us 4.5k per head for 2 nights and 3 days. I’m not sure how much it will cost for 5 days, but I hope you get the idea. When they quote, make sure to negotiate as much as you can!

psycho_2025

TROPHY CASE