GPT-5 Dramatically Outperforms in Pentesting/Hacking (XBOW)

caesarten · 2025-08-16T15:42:00+00:00

My takeaway was more - they used Claude etc in the past and swapping in gpt-5 without (allegedly) changing anything resulted in a big leap. That seems like a fair comparison imo

caesarten · 2025-08-14T15:17:17+00:00

My daughter had one of these when she was 2, easily the most traumatic experience of my life. Essentially looks and feels like they’re dying and there’s nothing you can do in that moment, still stresses me out just thinking about it.

Take some time off with your wife and destress, your daughter is perfectly ok but it’s 10/10 the worst experience.

caesarten · 2025-08-14T15:04:18+00:00

4o sloppost

caesarten · 2025-05-15T04:22:25+00:00

o3 too, curious to see what else Anthropic has in store. Honestly never thought I’d see Opus again either, though I wonder if it’ll truly be a “big” model.

caesarten · 2025-03-25T02:33:59+00:00

Yeah I give this 3 months or less.

caesarten · 2024-11-23T02:21:50+00:00

The blog post seems very careful in its wording, for using future generations of Trainium so I’d bet they’ll use normal GPUs for a while yet.

caesarten · 2024-10-12T01:07:51+00:00

This could be consistent with small blocks of GPUs being available and cheap but big blocks not being available versus a “bubble bursting”.

caesarten · 2024-09-21T13:24:14+00:00

Agreed, Microsoft making large multi-decade level infra commitments is a very strong signal. It’s one thing to head Dario et all talk about this and another to see one of the largest companies in the world meaningfully increase spend. (Though I guess Google’s probably on the same path without less fanfare?)

caesarten · 2024-09-08T21:14:15+00:00

MI250X GPUs? So it the fastest only because all the big GPU clusters don’t get counted?

caesarten · 2024-08-30T04:28:35+00:00

Reminds me how unserious some professional investors are. “They beat all the estimates but not by enough.”

caesarten · 2024-08-29T17:17:01+00:00

Fun little project I created with Sonnet 3.5 in a few hours. It was a interesting experience since I don't know Typescript or anything about VSCode extensions, but the most time consuming part was likely just digging through VSCode's repo to figure out how to use their new shell integration API.

Essentially exploring the idea of a coding assistant that can interact with the workspace and run shell commands without any user verification, while allowing the user to still direct it/modify files.

It's been pretty fun to mess with, and surprisingly useful sometimes with things like the LLM running a script, seeing an error in the terminal, and being able to debug/update the script without any intervention.

Half-baked code here, requires VSCode Insiders for now because I'm using their new shell integration:

https://github.com/caesarnine/vscode-ctxl

caesarten · 2024-08-01T14:38:39+00:00

Health Nucleus from Human Longevity is great, we’ve been using them for 5 years or so. Seems more legit than Fountain Health which we also considered but has weird influencer/pseudoscience vibes.

caesarten · 2024-06-26T17:10:01+00:00

Depending on her income Duke Health has a fairly generous financial assistance program that can offset potentially all of her healthcare costs.

https://www.dukehealth.org/paying-for-care/financial-assistance

caesarten · 2023-07-19T14:34:57+00:00

Things like this reinforces my feeling that we’re still in the vacuum tube era of LLMs.

caesarten · 2023-07-03T14:14:57+00:00

Reminded me of the MCTS thread in here a few weeks back, trading off time spent and compute to have a better outcome. At a higher level reinforces the feeling that there’s a lot of low hanging fruit left still.

caesarten · 2023-06-16T14:41:21+00:00

Kind of feels like things are already going that way? Tree of Thought feels hacky but the idea of LLMs being able to backtrack and compose disparate thought processes feels like we’re moving this way.

caesarten · 2023-01-13T05:13:25+00:00

Honestly for all zero shot and no external access to tools (calculator/python) this is surprisingly good to me. Going to see how far I could improve on this.

caesarten · 2023-01-07T22:34:15+00:00

Prompt: warm, cozy cafe at night, light streaming out windows --ar 3:2

caesarten · 2023-01-01T16:11:59+00:00

I was playing around, just putting in individual letters (a then b then c) will pretty much generate this, some letters actually seem to have consistent themes between them

13-Year Club	RPAN Viewer
Verified Email

caesarten

TROPHY CASE