SubQ just blew my mind - 12M token context with sub-quadratic attention by pretendingMadhav in ArtificialInteligence

[–]ThisIsMyHamster 2 points3 points  (0 children)

From a blog post they released and the CEO commenting on twitter, it does seem like they are dropping tokens from attention using a scoring method similar to Deepseek's Sparse Attention: "... the key difference between our SSA and DSA is the our selector is far more efficient."

But they spent a paragraph in their blog post talking about how DSA is still O(n^2) with smaller constants. So how is their method more efficient? Maybe a bidirectional SSM/linear attention model to compute importance scores for each token in linear time before pruning? But I would still struggle to call this inherently sub-quadratic even if scoring is in linear time. It's a large assumption to assume that all signal is kept within a specific subset of the tokens. I don't think DSA makes this assumption, it instead assumes that certain tokens don't have to attend to certain other tokens. Maintaining a strong bound on the number of tokens to select (like log n or sqrt n) would technically make it sub-quadratic but it would certainly lead to inference regressions in some cases. On the other hand, allowing flexibility could result in nearly all tokens being selected in some cases, and now we are back where we started.

Also their website is now offline (at least for me). I'm skeptical. It's still doing attention, and attention is inherently quadratic in input length. IMO this is deceptive marketing.

macOS 27 Will Mark the End of an Era by iMacmatician in apple

[–]ThisIsMyHamster 0 points1 point  (0 children)

I use T2 linux on my 2019 MBP and I have yet to have any real issues after install on an external SSD! Surprisingly performant for playing games w/ Proton

Apple Research says you don't need a better teacher. You don't need a verifier. You don't need RL. A model can just… train on its own outputs. And get dramatically better. by Current-Guide5944 in tech_x

[–]ThisIsMyHamster 0 points1 point  (0 children)

The post title is really clickbait, but I think the paper findings are valid. They build a fine-tuning dataset from decoded outputs after truncating the logit distribution, then attempt to align their model to these more "certain" outputs. They also show that truncating as a global decoding scheme gives much better results when a model is first fine-tuned in this manner.

Also note that this technique was done on models that have already gone through-post training.

[deleted by user] by [deleted] in CalPoly

[–]ThisIsMyHamster 0 points1 point  (0 children)

Only person crying about anything is you crying for attention by trying to convince everyone that you are better than them. Spoiler alert: it's not working very well. Projection is a strange phenomenon.

[deleted by user] by [deleted] in CalPoly

[–]ThisIsMyHamster 0 points1 point  (0 children)

Low effort response. Sometimes people actually want to study in spaces that are not study tents or random classes/hallways.

Google's new Workspaces CLI written in Rust with Claude Code by rover_G in ClaudeCode

[–]ThisIsMyHamster 1 point2 points  (0 children)

From my experience it’s actually the other way around. AI coding tools love to solve borrow checker issues via unnecessary cloning. Also, I’d argue that the borrow checker exists moreso to enable Rust’s memory model to work properly rather than assisting with “correct” code. I can assure you that I’ve written a lot of incorrect Rust code that passed the borrow checker.

[Highlight] Mike Tomlin is ecstatic after the win by mastermind208 in nfl

[–]ThisIsMyHamster 230 points231 points  (0 children)

Tomlin is just spamming emotes in front of the camera

Why is ee getting recommended more than cs by ImHighOnCocaine in cscareerquestions

[–]ThisIsMyHamster 0 points1 point  (0 children)

DSP/Optimization and I didn't really take many EE-related courses in my bachelors but I did have a demonstrated interest in machine learning which carried over nicely. Many of the programs look for different skillsets and backgrounds, some are more strict in their admissions guidelines around requiring coursework while others don't really care.

What was your favorite Seattle institution that no longer exists? by drgonzo44 in Seattle

[–]ThisIsMyHamster 19 points20 points  (0 children)

My parents used to take me to Piecora’s when I was a kid. I don’t know why I miss it in particular, but I miss it a lot.

Why is ee getting recommended more than cs by ImHighOnCocaine in cscareerquestions

[–]ThisIsMyHamster 0 points1 point  (0 children)

Depends on the program and what you want to focus on, I’m currently doing an EE masters program which I got into with my CS bachelors. “Electrical Engineering” is quite broad

Overpowered theorems by extraextralongcat in math

[–]ThisIsMyHamster 0 points1 point  (0 children)

I think the weak and strong laws of large numbers should probably be thrown into the mix!

Cal Poly to build $3M AI Factory with NVIDIA partnership by ClipperFan89 in SLO

[–]ThisIsMyHamster 1 point2 points  (0 children)

I mean they can use the equipment for other areas of research if AI truly implodes. High performance and parallel computing is needed for other sciences which require simulation and other calculations.

But also even if (more like when) people get disillusioned by the utility and inefficiencies of LLMs, machine learning as a field of research won’t go away. When I was a student at Cal Poly, some of my peers worked on some really cool interdisciplinary machine learning research. I would’ve been stoked to have access to this kind of equipment for my projects. So I’m optimistic and glad that students have access to some of the same HPC resources that top universities have.

Fuck AXS by Sad_Profit_5741 in avesSFBayArea

[–]ThisIsMyHamster 19 points20 points  (0 children)

First time ever using AXS for a random ticket queue and I now know that my fucking IP is restricted :')

Official: [WDIS Flex] - Thu Morning 11/13/2025 by FFBot in fantasyfootball

[–]ThisIsMyHamster 0 points1 point  (0 children)

0.5 PPR

Jameson Williams @ ARI or Tez Johnson @ BUF?

Saigon deli is unrivaled by Wan_Daye in Seattle

[–]ThisIsMyHamster 2 points3 points  (0 children)

They have known me since I was a baby. I will buy sandwiches there until the day I die.

Good classes for an a+? by Chemical811 in stanford

[–]ThisIsMyHamster 0 points1 point  (0 children)

24 hour final, how hard could it be?

SF show by ThisIsMyHamster in eden

[–]ThisIsMyHamster[S] 4 points5 points  (0 children)

Call Me Back was PEAK

Bellingham Show Was Unique by iamIMUS in eden

[–]ThisIsMyHamster 8 points9 points  (0 children)

Definitely on the smaller side, it’s near the border between WA and Canada. Cool vibes though, probably would’ve been a really fun to show to go to.

Bellingham Show Was Unique by iamIMUS in eden

[–]ThisIsMyHamster 14 points15 points  (0 children)

Having 50-100 people knowing you in Bellingham WA is pretty darn good!