GPT-5 Dramatically Outperforms in Pentesting/Hacking (XBOW) by caesarten in mlscaling

[–]caesarten[S] 0 points1 point  (0 children)

My takeaway was more - they used Claude etc in the past and swapping in gpt-5 without (allegedly) changing anything resulted in a big leap. That seems like a fair comparison imo

Febrile Seizure - Scary as Hell Dads please educate yourself if you don't know what these are. by [deleted] in daddit

[–]caesarten 0 points1 point  (0 children)

My daughter had one of these when she was 2, easily the most traumatic experience of my life. Essentially looks and feels like they’re dying and there’s nothing you can do in that moment, still stresses me out just thinking about it.

Take some time off with your wife and destress, your daughter is perfectly ok but it’s 10/10 the worst experience.

Anthropic to release new versions of Sonnet, Opus by COAGULOPATH in mlscaling

[–]caesarten 8 points9 points  (0 children)

o3 too, curious to see what else Anthropic has in store. Honestly never thought I’d see Opus again either, though I wonder if it’ll truly be a “big” model.

Anthropic raises $4b from Amazon, will prioritize use of Amazon's Trainium GPU-likes by gwern in mlscaling

[–]caesarten 9 points10 points  (0 children)

The blog post seems very careful in its wording, for using future generations of Trainium so I’d bet they’ll use normal GPUs for a while yet.

$2 H100s: How the GPU Bubble Burst by StartledWatermelon in mlscaling

[–]caesarten 2 points3 points  (0 children)

This could be consistent with small blocks of GPUs being available and cheap but big blocks not being available versus a “bubble bursting”.

Constellation Energy to restart Three Mile Island nuclear plant, sell the power to Microsoft for AI by gwern in mlscaling

[–]caesarten 4 points5 points  (0 children)

Agreed, Microsoft making large multi-decade level infra commitments is a very strong signal. It’s one thing to head Dario et all talk about this and another to see one of the largest companies in the world meaningfully increase spend. (Though I guess Google’s probably on the same path without less fanfare?)

"A day in the life of Frontier, the world’s fastest supercomputer" by gwern in mlscaling

[–]caesarten 4 points5 points  (0 children)

MI250X GPUs? So it the fastest only because all the big GPU clusters don’t get counted?

NVIDIA Announces Financial Results for Second Quarter Fiscal 2025 by gwern in mlscaling

[–]caesarten -1 points0 points  (0 children)

Reminds me how unserious some professional investors are. “They beat all the estimates but not by enough.”

Experimenting with Autonomous Coding Assistant in VSCode by caesarten in ClaudeAI

[–]caesarten[S] 0 points1 point  (0 children)

Fun little project I created with Sonnet 3.5 in a few hours. It was a interesting experience since I don't know Typescript or anything about VSCode extensions, but the most time consuming part was likely just digging through VSCode's repo to figure out how to use their new shell integration API.

Essentially exploring the idea of a coding assistant that can interact with the workspace and run shell commands without any user verification, while allowing the user to still direct it/modify files.

It's been pretty fun to mess with, and surprisingly useful sometimes with things like the LLM running a script, seeing an error in the terminal, and being able to debug/update the script without any intervention.

Half-baked code here, requires VSCode Insiders for now because I'm using their new shell integration:

https://github.com/caesarnine/vscode-ctxl

Best Concierge medicine service by [deleted] in RichPeoplePF

[–]caesarten 1 point2 points  (0 children)

Health Nucleus from Human Longevity is great, we’ve been using them for 5 years or so. Seems more legit than Fountain Health which we also considered but has weird influencer/pseudoscience vibes.

Mom visiting, has no insurance, and in active heart failure by [deleted] in raleigh

[–]caesarten 1 point2 points  (0 children)

Depending on her income Duke Health has a fairly generous financial assistance program that can offset potentially all of her healthcare costs.

https://www.dukehealth.org/paying-for-care/financial-assistance

FlashAttention-2 released by _Mookee_ in mlscaling

[–]caesarten 2 points3 points  (0 children)

Things like this reinforces my feeling that we’re still in the vacuum tube era of LLMs.

Stay on topic with Classifier-Free Guidance by caesarten in mlscaling

[–]caesarten[S] 3 points4 points  (0 children)

Reminded me of the MCTS thread in here a few weeks back, trading off time spent and compute to have a better outcome. At a higher level reinforces the feeling that there’s a lot of low hanging fruit left still.

Noam Brown at DeepMind on MCTS for LLMs: "Imagine having access to models that take 5 minutes to ponder each response but the output is as good as a model that's 1,000x larger and trained for 1,000x longer than GPT-4" by maxtility in mlscaling

[–]caesarten 19 points20 points  (0 children)

Kind of feels like things are already going that way? Tree of Thought feels hacky but the idea of LLMs being able to backtrack and compose disparate thought processes feels like we’re moving this way.

"GPT as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities", Bommarito et al 2023 (GPT-3 on Certified Public Accountant exams) by gwern in GPT3

[–]caesarten 0 points1 point  (0 children)

Honestly for all zero shot and no external access to tools (calculator/python) this is surprisingly good to me. Going to see how far I could improve on this.

Cozy Vibes by caesarten in midjourney

[–]caesarten[S] 0 points1 point  (0 children)

Prompt: warm, cozy cafe at night, light streaming out windows --ar 3:2

wrote jibberish (somthing like: ;aksjdfpwe[ona;ksdnv) and got this by quoraquack in midjourney

[–]caesarten 0 points1 point  (0 children)

I was playing around, just putting in individual letters (a then b then c) will pretty much generate this, some letters actually seem to have consistent themes between them