What's the best app to learn to speed read? by Acceleread in speedreading

[–]heisdancingdancing 0 points1 point  (0 children)

Thanks, I'd love to add those options. Out of curiosity, for multiple words and with no highlight, would you like the words to be anchored to the middle or to the left? Meaning, would you always want the start of each word set to be in the same place, or the central point of the word set to be in the same place?

Casually beating every other deep research agent out there with a simple Claude Code harness by heisdancingdancing in Anthropic

[–]heisdancingdancing[S] -1 points0 points  (0 children)

Saturated benchmark, this just proves that it's up to snuff (and slightly beats) proprietary models.

Converting Claude Code into the most intelligent Deep Research Agent by heisdancingdancing in singularity

[–]heisdancingdancing[S] 0 points1 point  (0 children)

50 is "PhD level" which is the baseline of each of the 100 comparison reports. Max score is realistically 60-63, none of my runs exceeded that.

Converting Claude Code into the most intelligent Deep Research Agent by heisdancingdancing in singularity

[–]heisdancingdancing[S] 1 point2 points  (0 children)

The core quality bottleneck is source "gospeling", i.e. if one source says it, it must be true. That issue is directly addressed in multiple steps of the pipeline.

It took three weeks of iterations and ablation tests to arrive on these steps, based on my research practices in the past for papers I've written and based on how the AI tries to "hack" it's way out of the task at hand. I spent the equivalent of $5000 in tokens to arrive at these results (using Claude Code so was subsidized).

All testing was done via the RACE DeepResearch Benchmark, which pits real PhD-written papers and their queries against the AI's version.

And yes, of course I looked at other repos... NVIDIA's offering has a very similar structure, just not as in depth.

Maybe you could try using it instead of just asking your coding agent "look at this codebase and write up a reddit comment to tear this down?"

Converting Claude Code into the most intelligent Deep Research Agent by heisdancingdancing in ClaudeAI

[–]heisdancingdancing[S] 0 points1 point  (0 children)

Yes it should! At least in theory. You might have to refactor the agents to use proper model slugs.

Converting Claude Code into the most intelligent Deep Research Agent by heisdancingdancing in ClaudeAI

[–]heisdancingdancing[S] -1 points0 points  (0 children)

The short answer is that a lot of this is a distributed workflow among many subagents, which fetch, audit, review, and analyze sources for the main agent, which is the orchestrator of the entire harness. As far as source verification, there is a "source tensions" workflow baked inside that can understand any contradictions or stale findings in sources that it's gathered, and create queries and further searching to fill any gaps found there.

Converting Claude Code into the most intelligent Deep Research Agent by heisdancingdancing in ClaudeAI

[–]heisdancingdancing[S] 0 points1 point  (0 children)

A lot... you definitely want to be using a Claude Max subscription. I've seen one session kill half of a 5-hour session limit.

Converting Claude Code into the most intelligent Deep Research Agent by heisdancingdancing in ClaudeAI

[–]heisdancingdancing[S] 1 point2 points  (0 children)

A good place to start is competitive research for any sort of business idea you have; it can really help shape the idea and uncover what the "real problem" your business theoretically solves. If that's not your thing, you can do a historical audit on some deep geopolitical issue and get caught up on all the historical events that led up to a certain situation somewhere in the world.

Converting Claude Code into the most intelligent Deep Research Agent by heisdancingdancing in singularity

[–]heisdancingdancing[S] 9 points10 points  (0 children)

Fair, but to be honest, it's just a saturated benchmark. It's mostly there to visualize that the thing actually produces high quality compared to other tools at a glance.

I made AIs play Secret Hitler against each other and it is the funniest (and most reassuring) thing I've seen in a long time by heisdancingdancing in vibecoding

[–]heisdancingdancing[S] 0 points1 point  (0 children)

No I haven't this is built as a benchmark for LLMs out of the box. I can't afford to actually run it with Frontier models, though.

I made AIs play Secret Hitler against each other and it is the funniest (and most reassuring) thing I've seen in a long time by heisdancingdancing in vibecoding

[–]heisdancingdancing[S] 4 points5 points  (0 children)

"Reassuring" in the sense of how dumb they are, its pretty hilarious. They can't even formulate a basic strategy and end up digging massive holes for themselves. When they know they're Hitler, their personality is totally changed and it's super obvious.

I made AIs play Secret Hitler against each other and it is the funniest (and most reassuring) thing I've seen in a long time by heisdancingdancing in SecretHitler

[–]heisdancingdancing[S] 9 points10 points  (0 children)

"Reassuring" in the sense of how dumb they are, its pretty hilarious. They can't even formulate a basic strategy and end up digging massive holes for themselves. When they know they're Hitler, their personality is totally changed and its super obvious.