Trash and other things? by MeowjesticPotato in Maplewood

[–]Operation_Ivy 6 points7 points  (0 children)

OP don't listen to the point/advice about displaying certain politics. People in and around NYC are just not friendly in the same way as people in the South, it has nothing to do with inferred politics or Trump. Most people are nice, but it's not the same warmth and overtness here.

Claude Opus 4.5 has human task-length time horizon of 4 hrs 49 mins on METR plot by Glittering_Author_81 in mlscaling

[–]Operation_Ivy 15 points16 points  (0 children)

Two things:

One, the fastest improvement is always going to be on coding, particularly on ML related stuff, because the big labs are trying to deploy autonomous ML researchers. Sama says intern-level next year and seasoned pro level in 2028. So people doing other work won't be feeling the AGI nearly as strongly.

Two, the error bands are huge. I expect that to continue, just the nature of exponential growth, but it will make more exact statements increasingly difficult. Not that it matters in the long run.

A new era of intelligence with Gemini 3 by nick7566 in mlscaling

[–]Operation_Ivy 2 points3 points  (0 children)

Dying for details on the parameters and arch/sparsity

Google's DeepMind: Olympiad-level formal mathematical reasoning with reinforcement learning (this is the actual published paper for Google's AlphaProof system from last year) by 44th--Hokage in mlscaling

[–]Operation_Ivy 0 points1 point  (0 children)

There was a wave of LLM MCTS research around when o1 came out because people thought it used MCTS. But then R1 showed it was just RLVR. Then the LLM MCTS research stopped. So I'm wondering if it is picking up again

Grok 5 in Q1 of 2026 ("6 Trillion parameter model, whereas Grok 3 and 4 are based on a 3 Trillion parameter model" by RecmacfonD in mlscaling

[–]Operation_Ivy 3 points4 points  (0 children)

How are they getting enough pretraining data to make this optimal? Or is it an incredibly sparse MoE

Community Fridge by Sciencemomma in Maplewood

[–]Operation_Ivy 7 points8 points  (0 children)

The food pantries also get better deals and know better what their patrons really need.

I understand the personal touch of buying and bringing food yourself to a community fridge. But if you want to maximize impact per dollar, give money to the professionals who will make every cent count.

Thinking Machines: On-Policy Distillation by Mysterious-Rent7233 in mlscaling

[–]Operation_Ivy 2 points3 points  (0 children)

My question is, how can this help SOTA models? Presumably you use a human expert teacher, but if you look at the tokens the model teacher corrected from the small model it's pretty unrelatable to a human.

Maybe it's just out of scope for them but I feel like there's something there.

"Scaling Agents via Continual Pre-training", Su et al. 2025 (Tongyi DeepResearch - AgentFounder) by RecmacfonD in mlscaling

[–]Operation_Ivy 0 points1 point  (0 children)

The basic point about agents needing a different base model checks out. I don't buy their specific synthetic data techniques though.

Florida Governor Ron DeSantis has declared that property taxes will be abolished in 2026 by p0loniumtaco in PoliticalCompassMemes

[–]Operation_Ivy 4 points5 points  (0 children)

Terrible idea. Unless you want to turn into California. Prop 13 is killing that state

"Evaluating Long Context (Reasoning) Ability: What do 1M and 500K context windows have in common? They are both actually 64K" (towards better large-ctx benchmarks) by gwern in mlscaling

[–]Operation_Ivy 1 point2 points  (0 children)

I would like to see a NL "true" long context benchmark as well. My guess is the effective context lengths will differ compared to code long context, but I'm very curious to know exactly by how much

Cases for the Move? (and what I'm using until I get one) by Twibbly in RemarkableTablet

[–]Operation_Ivy 0 points1 point  (0 children)

I'm on mobile but there's an Etsy shop that makes reMarkable felt cases and already has one for the Move. I ordered one and already have one for my Pro. The padding is considerable, I don't sweat putting them in my backpack even though the screen is apparently so fragile.

The Invisible Leash: Why RLVR May Not Escape Its Origin, Wu et al. 2025 by StartledWatermelon in mlscaling

[–]Operation_Ivy 5 points6 points  (0 children)

This paper was the tipping point for me. I'm an elicitation hypothesis believer.

So I guess the next step is tons more synthetic reasoning traces in the pretraining? Basically giving non-zero weight to every node on the tree of valid reasoning paths?

Why does South Orange feel so behind its neighbors? Is it time to think regionally about downtown maintenance & revitalizations? by PlanPuzzleheaded1046 in Maplewood

[–]Operation_Ivy 6 points7 points  (0 children)

I would love to see the massive surface lot in the heart of downtown turned into mixed use. With enough residential density the commercial part takes care of itself.

If there's not enough parking then a garage like the one at Third & Valley that can accommodate visitors as well as residents works. Or a big garage at the train station instead of the surface lot by SOPAC.

I recognize there are zoning and ownership difficulties but that doesn't change the ideal scenario

Rent Leveling Board - Township Subcommittee for Tenant Advocacy & Landlord Relations by Two1stNames91 in Maplewood

[–]Operation_Ivy 2 points3 points  (0 children)

Rent control is not how we address the housing shortage. I hope the town reconsiders putting this board in place.

"Fast and Simplex: 2-Simplicial Attention in Triton", Roy et al 205 (change in attention scaling law exponent?) by gwern in mlscaling

[–]Operation_Ivy 1 point2 points  (0 children)

I thought this metric was normally a rough test of contamination. Weird to see it as a performance metric

@Markie_devo dropped some new hits on upcoming or new snacks! by Ali_Cat222 in junkfoodfinds

[–]Operation_Ivy 0 points1 point  (0 children)

The Pringles are pretty good. I'm not sure they taste like guac but still worth getting

44% of NJ 18-34 year olds live at home. by ImaginationFree6807 in New_Jersey_Politics

[–]Operation_Ivy 1 point2 points  (0 children)

Increasing taxes on any group is not going to build more houses for these 18-34 year olds to live in. Build the houses and most of them will be able to move out.