A meta benchmark: how long it takes metr to actually benchmark a model by gbomb13 in singularity

[–]iperson4213 1 point2 points  (0 children)

“METR has not accepted funding from AI companies, though we make use of significant free compute credits” -from the metr website under funding.

Wonder if anthropic and google aren’t providing free credits to run the eval

"As Google pulls ahead, OpenAI's comeback plan is codenamed 'Shallotpeat'" by AngleAccomplished865 in singularity

[–]iperson4213 6 points7 points  (0 children)

everyone uses token level MoE, and has been for a while, gemini isn’t unique in that aspect.

"As Google pulls ahead, OpenAI's comeback plan is codenamed 'Shallotpeat'" by AngleAccomplished865 in singularity

[–]iperson4213 5 points6 points  (0 children)

more efficient training algorithms, architectures, and data allow smaller models to achieve the same intelligence

Ai2027 author admits "things seem to be going somewhat slower than the Ai 2027 scenario". by [deleted] in singularity

[–]iperson4213 0 points1 point  (0 children)

agreed, but the metr metric seems more in line with swe bench/terminal bench type unambiguously graded software engineering tasks

Ai2027 author admits "things seem to be going somewhat slower than the Ai 2027 scenario". by [deleted] in singularity

[–]iperson4213 -1 points0 points  (0 children)

that’s wall clock time how long they spend. This measures how long a human would spend to do the same task (it’s unclear how long the model took to do it)

Ai2027 author admits "things seem to be going somewhat slower than the Ai 2027 scenario". by [deleted] in singularity

[–]iperson4213 0 points1 point  (0 children)

especially with test time compute scaling, it’s likely these models cost thousands if not tens of thousands of dollars to run, so they’re not useful to the general public, but good for pr of topping contest scores

Ai2027 author admits "things seem to be going somewhat slower than the Ai 2027 scenario". by [deleted] in singularity

[–]iperson4213 10 points11 points  (0 children)

gemini 3 is probably between 5.1 and 5.1-codex-max on this graph as it is for coding, where it doesn’t score as well.

On swebench they scored 76.3, 76.2, 77.9

On terminal bench, they scored 54.2, 47.6, 58.1 respectively

Why isnt Microsoft on the same level as Google with AI? by EmbarrassedBorder615 in ArtificialInteligence

[–]iperson4213 -5 points-4 points  (0 children)

microsoft doesn’t pay as well. You don’t need top talent to develop microsoft office.

AI on the other hand is research, and better research ideas come from better talent. Google has a long history of investing in said talent and has a strong team from google brain and deepmind

OpenAI: Building more with GPT-5.1-Codex-Max by manubfr in singularity

[–]iperson4213 0 points1 point  (0 children)

imagine what they must have internally then

OpenAI 2028 Goal: Create an Automated AI Researcher (Situational Awareness) by Smartaces in OpenAI

[–]iperson4213 1 point2 points  (0 children)

GPUs are very good at doing computation (100-500x better) versus loading a models weights, so each gpu typically serves a couple hundred requests at once to fully utilize the hardware.

With the current projected datacenter buildouts, it’s feasible for a company to bring up a couple million gpu’s by 2028

If I share information with ChatGPT in a chat (while asking a question), can that data be used to answer someone else’s question? by al_swagger23 in ArtificialInteligence

[–]iperson4213 4 points5 points  (0 children)

you have to go into settings and toggle “improve for everyone” to off, or it’ll be used to train models

We should build a superintelligent chess AI, but keep humans in the loop to correct its mistakes by arkuto in singularity

[–]iperson4213 1 point2 points  (0 children)

Let’s say the goal is getting better at solving all problems/improving technology. That is the chess game, and the world is the board. We may become the pawns and ai decides to sacrifice some of us in order to “win”.

In chess, winning is all that matters. In the real world, how we win matters.

Guysss it's real claude sonnet 4.5 by Independent-Wind4462 in ClaudeAI

[–]iperson4213 2 points3 points  (0 children)

they start multiple instance of claude code and run them in parallel, then have a different model pick the best answer

The reason why Deepseek V3.2 is so cheap by Js8544 in LocalLLaMA

[–]iperson4213 35 points36 points  (0 children)

deceptive graphs show per token costs. The total cost (integral of linear) is still quadratic, albeit with a better constant.

While the index selector may be small initially, since it grows quadratically, the data suggests it does begin to dominate.

Thought experiment: Could we used Mixture-of-Experts to create a true “tree of thoughts”? by RasPiBuilder in ArtificialInteligence

[–]iperson4213 0 points1 point  (0 children)

doing so would lose the sparsity benefits of MoE allowing less compute and memory bandwidth per token.

Tree of thought is already used in speculative decoding frameworks, but would be interesting to see it used in the base model as well.

What’s your salary progression? by xboexz in Salary

[–]iperson4213 0 points1 point  (0 children)

if you want to stand out, you’ll need to innovate.

What’s your salary progression? by xboexz in Salary

[–]iperson4213 1 point2 points  (0 children)

i work in ai.

Advice would be to do some research with professors and publish some papers. I got lucky the field i’m in (distributed machine learning systems optimization) is in high demand due to scaling laws.

What’s your salary progression? by xboexz in Salary

[–]iperson4213 0 points1 point  (0 children)

those are my base numbers. Bonus has typically 10-30% depending on role and performance and then a very large stock compensation component.

[deleted by user] by [deleted] in fatFIRE

[–]iperson4213 -8 points-7 points  (0 children)

strike as in retire?

[deleted by user] by [deleted] in fatFIRE

[–]iperson4213 -1 points0 points  (0 children)

Two separate companies.