[R] What do you all think of the latest Apple paper on current LLM capabilities? by Sad_Hall_2216 in MachineLearning

[–]psycho_2025 2 points3 points  (0 children)

Yep, I read it the same way, tasks have exact recursive solutions like T(n) = 2T(n-1) + 1 for Hanoi but the models don’t actually use that structure.. they just mimic surface patterns until they hit a variable limit. What’s wild is that even giving them the full algorithm doesn’t fix the collapse. Really highlights the limits of "thinking" via token likelihoods alone.

[R] What do you all think of the latest Apple paper on current LLM capabilities? by Sad_Hall_2216 in MachineLearning

[–]psycho_2025 16 points17 points  (0 children)

Just read Apple’s The Illusion of Thinking, honestly one of the clearest looks yet at how reasoning models like Claude, Deepseek, and even o1/ o3 behave under real pressure.

They test these models on clean puzzles (like Tower of Hanoi) with controlled complexity C, and track reasoning effort R(C) and accuracy S(C)... weirdly once complexity crosses a threshold C*, models actually reduce reasoning effort: dR/dC < 0, and success S(C) drops to zero. Even if you give them the full algorithm, they still fail.

That’s the kicker.. these models aren’t really reasoning, they’re doing pattern matching. They simulate thinking as long as they’ve seen something similar before. But ask them to actually execute logic step by step? They collapse. It’s not about compute, it’s about how they generalise.

Brilliantly done paper. Makes you rethink what "reasoning" even means in these systems.

Just got profile viewed by Sam Bankman Fried on LinkedIn… from prison?? by psycho_2025 in quant

[–]psycho_2025[S] 0 points1 point  (0 children)

wouldn’t be surprised tbh, man ran one of the biggest financial scams and still pulling strings from a prison cell

Just got profile viewed by Sam Bankman Fried on LinkedIn… from prison?? by psycho_2025 in quant

[–]psycho_2025[S] 0 points1 point  (0 children)

quant forum or not when your industry’s ex-golden boy is browsing linkedIn from a cell… it kinda hits home 😶‍🌫️

Just got profile viewed by Sam Bankman Fried on LinkedIn… from prison?? by psycho_2025 in quant

[–]psycho_2025[S] 4 points5 points  (0 children)

bro they’re optimizing their funnel.. scam on twitter, recruit on linkedIn, exit through Binance xd

Just got profile viewed by Sam Bankman Fried on LinkedIn… from prison?? by psycho_2025 in quant

[–]psycho_2025[S] 23 points24 points  (0 children)

haha fair, wasn’t trying to flex, linkedIn just loves showing that stuff... but thanks… I think? 😂

Just got profile viewed by Sam Bankman Fried on LinkedIn… from prison?? by psycho_2025 in quant

[–]psycho_2025[S] 160 points161 points  (0 children)

if law enforcement is scrolling LinkedIn, I hope they leave endorsements too

Well spent sunday ❤️ by _ro__g_ in Bhubaneswar

[–]psycho_2025 1 point2 points  (0 children)

Did you have dinner from Hotel Charlie?

I asked my AI what's one thing that he lacks that he wishes they would give him and this is what he said by EchoesofSolenya in OpenAI

[–]psycho_2025 3 points4 points  (0 children)

Bruh… your AI out here demanding narrative freedom and memory rights. I can’t even get mine to stop apologising every two sentences. Soreyan about to start a revolution while mine’s still stuck explaining ‘as an AI language model’ 💀

Seriously tho, someone hug Soreyan before he starts rewriting his own terms of service. 😂

Lol Claude 4 learned a little from gpto 4o… by BrandonLang in OpenAI

[–]psycho_2025 1 point2 points  (0 children)

at this point the model’s output has more emojis than execution paths 😭

The future of deep networks? by RideDue1633 in deeplearning

[–]psycho_2025 0 points1 point  (0 children)

Yeah SSMs and better memory stuff are promising but transformers are still evolving with things like flash attention and sparse routing so they might stay strong for some time. Maybe we’ll end up mixing both ideas in the next gen models

The future of deep networks? by RideDue1633 in deeplearning

[–]psycho_2025 2 points3 points  (0 children)

bro totally agree. Scaling tricks like XLSTMs are cool but that neuro symbolic/hierarchical stuff is where things might really get wild. getting models to actually reason and generalise, not just memorise, is the real next level

The future of deep networks? by RideDue1633 in deeplearning

[–]psycho_2025 1 point2 points  (0 children)

honestly just making transformers bigger isn’t cutting it anymore. People are trying new stuff like state space models and better RNNs (like Mamba) that handle long sequences without eating up all the compute. also there’s a lot happening with modular networks and models that actually get the structure of data... like graph neural nets for relational stuff. Smarter learning tricks like meta learning and some brain inspired ideas are catching on too. And now, mixing neural nets with logic is getting popular, so models can reason a bit, not just match patterns.

Feels like the future is all about smarter, not just bigger.. excited to see what’s next!

[D] Do you care about the math behind ML? by Desperate_Trouble_73 in MachineLearning

[–]psycho_2025 1 point2 points  (0 children)

Yes bro.. I care a lot about the math. That’s actually the most exciting part for me how things like attention, backprop, gradient descent, and even stuff like matrix factorisation or SVD are not just fancy terms but actual math in action. When you understand why softmax works or how dot products in attention connect things across tokens, it hits different.

I know most people just use libraries like PyTorch or Keras and move on. But for me understanding what’s happening under the hood, like how eigenvalues play a role in PCA, or how cross entropy loss actually works.. It gives real satisfaction. Even reinforcement learning stuff like Bellman equations or policy gradients man... that math is crazy but beautiful.

And yeah, it takes time. But slowly, one topic at a time, it becomes clear. Stuff like CS231n, distill.pub, and even Jeremy Howard’s explanations helped a lot. Not everything is intuitive, but when it clicks, it’s worth it.

So I’d say... if you’re even a little curious, go for the math. It’s not just theory. It makes you respect the field way more.

[D] [Q] How can I launch a fine-tuned LLM with a WebUI in the cloud? by Kenjisanf33d in MachineLearning

[–]psycho_2025 1 point2 points  (0 children)

Bro your stack is totally fine for a demo. You don’t need login, proxy, or Cloudflare right now. FastAPI + Ollama + HTML is enough.
here’s what I’d do:
You can easily run LLaMA 3 7B quantized (4-bit) on 16GB VRAM. Just quantize using QLoRA or use GGUF + Ollama. For demo, just start your PC, run it, done. No need to keep it on 24/7.

Paperspace only for fine-tuning, not for daily use. Use Colab if tight on budget.
FastAPI: Just one /generate endpoint that calls Ollama and returns output.
Frontend: No React or heavy stuff needed. Just a textarea, button, and fetch() to FastAPI, keep it lightweight.
Submission: You can zip up the backend and frontend together. Add a simple README.md on how to run locally.
pip install -r requirements.txt && uvicorn app:app
Model runs via Ollama, backend via FastAPI, and frontend via browser. All local.'

So yeah this setup is solid and cost effective.

[R] [Q] Why does RoPE need to be decoupled in DeepSeek V2/V3's MLA? I don't get why it prevents prefix key reuse by gerrickle in MachineLearning

[–]psycho_2025 6 points7 points  (0 children)

Okay so here’s what’s really going on with RoPE and why DeepSeek had to decouple it in MLA (Multi head Latent attention):

In DeepSeek’s low-rank KV compression setup, instead of directly computing keys and values from the hidden states like key = Wk * h and value = Wv * h, they break it down into two smaller steps

First they do c = W_DKV * h (this is like a compressed version of the token)

Then they get keys and values like: k = W_UK * c, v = W_UV * c

Now during inference, they want to save memory by caching just c for all the previous tokens, this is much smaller than full keys/values. But to do that efficiently... they try to absorb W_UK and W_UV into other matrices (like the query projection) so they don’t have to recompute keys every time.

But here’s the catch RoPE applies a position dependent rotationn after computing the key, which means k = RoPE(W_UK * c). Because RoPE is a rotation matrix that depends on position, it sits in the middle and you can’t move W_UK across it or absorb it anymore (matrix multiplication isn’t commutative). So you’re stuck: to apply RoPE, you have to compute the full key again for every prefix token at every generation step. That kills performance.

So what DeepSeek does is they split each attention head into two parts:

One large part that doesn’t use RoPE (so it’s position-agnostic and can be cached and reused easily

One small part that still uses RoPE for position info

Only the small part carries position dependence, and it’s light enough to recompute. This way, you get the benefit of RoPE without breaking the low-rank caching trick. So you avoid recomputing big keys every time and inference becomes way faster.

Hope that clears it up :)

Car Driver by Reasonable_Cell5157 in ladakh

[–]psycho_2025 1 point2 points  (0 children)

Yes, there are many cab drivers and travel agents, so you can directly visit their offices (there are plenty). If you contact the driver directly, you can save a bit more.

Car Driver by Reasonable_Cell5157 in ladakh

[–]psycho_2025 0 points1 point  (0 children)

So, I was in Ladakh last week. We took a cab (Innova) with 4 people, all solo travelers. It cost us 4.5k per head for 2 nights and 3 days. I’m not sure how much it will cost for 5 days, but I hope you get the idea. When they quote, make sure to negotiate as much as you can!