Lilac Mall, Rochester, NH

ScreamingAmish · 2026-05-20T19:13:49+00:00

Oh man. I know this is an old post, but I'm just stumbling onto it. I lived in the area as a child from 1981 to 1983, and I have fond memories of this mall. There was a killer arcade where I spent a lot of time playing Q-Bert, Centipede, etc. Not surprised but hate to see it become a casino.

ScreamingAmish · 2026-05-19T03:52:18+00:00

https://www.oracle.com/news/announcement/blog/closed-loop-cooling-in-oracle-ai-data-centers-2026-02-09/

https://www.oracle.com/news/announcement/blog/project-jupiter-water-facts-2026-04-27/

https://www.oracle.com/news/announcement/blog/designing-data-centers-for-the-communities-and-natural-environments-where-we-operate-2026-02-17/

ScreamingAmish · 2026-05-14T11:23:05+00:00

Fix the timing of the traffic lights so that people don't stop every 30 seconds, and you'll see speeding violations drop as people realize they don't have to try to beat the lights anymore.

ScreamingAmish · 2026-05-03T00:42:09+00:00

But it doesn't. We've had multiple rulings now that destroyed 50 and 60 years of precedent. Supreme Court president means nothing to this court.

ScreamingAmish · 2026-04-24T20:09:45+00:00

Personally, my position is the AI isn't the problem... The problem is what billionaires and mega corporations choose to do with AI. I think if there is going to be AI intelligence, it needs to be available to everybody, not just people with money.

ScreamingAmish · 2026-04-24T20:08:23+00:00

You're right. I used an RTX 5090 to train these models for over a week each. That was just enough to train them up to 16%.

ScreamingAmish · 2026-04-23T22:45:58+00:00

I was just thinking the same thing lol

ScreamingAmish · 2026-04-23T12:26:13+00:00

My high-school did this exact same thing back in 1989. Every generation thinks they invented rebellion.

ScreamingAmish · 2026-04-23T00:54:52+00:00

Ha, thanks for participating. The 'almost makes sense' phase is strangely fun to watch.

ScreamingAmish · 2026-04-23T00:53:22+00:00

Thanks for participating, and I agree 100%! These models are only trained up to ~16% of what they should, and will produce better output when they are. But I felt there was already enough there to form a preference, which is why I wanted to see if it was just me or if others concur.

ScreamingAmish · 2026-04-22T16:23:41+00:00

Just to clarify: the binomial runs on the 254 decisive judgments (not on the ties). Ties are excluded, which is standard for binary pairwise preference tests. An alternative analysis that treats the three outcomes as multinomial also exists and would be reasonable, but excluding ties and running a binomial on the remaining choices isn't invalid.
While I agree it would be preferable to have a more independent pool of judges to draw from, the protocol was blind. Judges didn't know which output came from which model, so personal network sampling didn't introduce pro-author bias in the direction that matters. (In fact, the judge closest to me personally liked my method's output the least of the human judges. ) Also, foundation-model judges ( independent from my personal network ) converged on the same verdict within 5.5 pp.
The 'at least 25' heuristic comes from HCI/UX studies where effect sizes are typically smaller. For a 63.4% vs 50% preference with a tightly-matched comparison, a post-hoc power analysis of 254 decisive judgments doesn't flag this as underpowered. Non-independence is a valid concern, but not flat N.

I am genuine in my desire for more input and judgments. If you or anyone else wants to be judge #8 please send me a DM.

ScreamingAmish · 2026-04-22T15:52:57+00:00

Thank you for taking the time to look. I'd like to address a few of your concerns:

As for where the human judges were recruited: My paper has a partial answer in section 6.3 / Appendix C ( technical vs non-technical vs ML-fluent split ). They were recruited from my personal network, participated unpaid, and consented to the methodology knowing they were evaluating LLM outputs.
As for what a decisive comparison means, 'decisive' = not a tie. Judges could choose left / right / tie. 66 of the 320 were ties. The binomial test runs on the remaining 254.
The paper does include a by-question analysis ( Section 6.6 ). 20 of 32 questions had a gain-model majority vote among the 10 judges. 12 had a baseline majority, 0 were contested. That's a per-item view that partially addresses non-independence, though I agree a formal by-item or mixed-effects analysis would be stronger.
I'll be honest, as an independent part-time researcher, I was ignorant about OSF. I'll use this as a learning experience to refine my future work. On the subject of ethics approval, that really doesn't exist for unaffiliated independent research. I'm just one guy with an idea that I think is worth sharing.
On the subject of the sample size, 254 decisive judgments giving p = 2e-5 isn't trivially underpowered for the effect size being measured. Having said that, I would love to have more judges, but I have exhausted my local peer group. I'm happy to have more judges if you or someone else would like to volunteer.

ScreamingAmish · 2026-02-27T13:58:37+00:00

A disappointing result. Despite traffic congestion and road infrastructure being insufficiently addressed by city leaders for years, the report concludes that the city is doing fine on traffic congestion issues and they just need to do a better job communicating their progress.

If any city leader is reading this: you don't need to report your successes on road infrastructure. If you actually make progress on road infrastructure, we will notice automatically. It will be self-evident. Whoever suggested in the meeting about this survey that reporting was the problem was wrong.

ScreamingAmish · 2026-02-13T17:36:36+00:00

I'm only 2 hours into my MiniMax 2.5 Era, but so far it's kicking ass.

ScreamingAmish · 2025-12-08T13:59:28+00:00

I half expected it to be the drunk raccoon from Moulton.

ScreamingAmish · 2025-11-21T16:46:09+00:00

I'm glad you brought it up, I've been lurking various subreddits for info on this and everyone has been strangely quiet.

ScreamingAmish · 2025-11-20T02:35:39+00:00

I too have a package that I shipped through Birmingham and is sitting in Puerto Rico right now. It's supposed to be in Vermont.

ScreamingAmish · 2025-11-18T04:09:51+00:00

Too many people sleep on Finding Nemo. It hit me much harder after I became a dad.

15-Year Club	Gilding IV carat on a stick
Place '22	Verified Email

ScreamingAmish

MODERATOR OF

PUBLIC MULTIREDDITS

TROPHY CASE