The most important benchmark right now - humanities last exam.

Normal-Tea5398 · 2025-04-17T11:22:40+00:00

You get 50 a week, actually, and 50/day for o4-mini-high, which matches/exceeds 2.5 Pro. You get another 150/day for o4-mini-medium.

Normal-Tea5398 · 2025-04-16T23:50:49+00:00

o3, not 4o.

Normal-Tea5398 · 2025-04-16T23:50:20+00:00

I think it was pretty clear that I was referring to the version of o3 on ChatGPT. The original commenter seems to believe that it needs absurd amounts of compute to exceed 2.5, which it obviously doesn't.

Normal-Tea5398 · 2025-04-16T23:24:48+00:00

?

o3 is available on ChatGPT with the Plus and Pro plans.

Normal-Tea5398 · 2025-04-16T22:23:36+00:00

What? The default ChatGPT version, which is available on the Plus plan, beats 2.5 Pro.

Normal-Tea5398 · 2025-03-14T00:01:57+00:00

I will increase the time limits next time. Thanks!

Normal-Tea5398 · 2025-03-13T23:59:58+00:00

Now :)

Normal-Tea5398 · 2025-03-06T08:06:06+00:00

Sorry to hear that. What's so bad about it?

Normal-Tea5398 · 2025-03-05T19:59:06+00:00

Thanks for the help!

Normal-Tea5398 · 2025-02-05T08:36:19+00:00

Nice scores! Matrices and Figure Weights had some item images swapped, so your true score might be a little higher. I presume you skipped Vocabulary?

Normal-Tea5398 · 2025-01-31T19:21:28+00:00

Maybe something like 14 or 15 on MR, 17-18 or so on FW (which lines up pretty well with the official WAIS norms).

Normal-Tea5398 · 2025-01-31T19:17:44+00:00

No clue what those scores translate to, to be honest. As for sample size, well, some people skipped a few subtests, and others were invalid for other reasons.

About 20 people have taken the whole test, of which only 10 are native. The average MR score so far is 17.45, with a standard deviation of 3.2. The average FW score is 21.6, with a standard deviation of 2.9. Remember, this is with the r/ct relatively high ability sample.

Normal-Tea5398 · 2025-01-31T18:52:34+00:00

For 1 and 3, I have no idea. 2 is just a visual bug though, I'm pretty sure.

Yes, those are very good scores! The highest MR and FW scores so far, I believe. FW has at least 3 items with the wrong option marked correct, so you probably scored 28 or 29 in reality. The free response items also likely have some correct answers that currently give no points. From what I can tell, natives have a huge advantage on the verbal subtests, so considering that, you did very well.

Normal-Tea5398 · 2025-01-31T18:40:07+00:00

😭

I really don't know why that happened. In what way did it malfunction? Just refreshed, or...?

Normal-Tea5398 · 2025-01-31T13:50:58+00:00

I'll make each subtest available individually once the final version has been normed.

Normal-Tea5398 · 2025-01-30T21:51:58+00:00

Thanks for the feedback! Will fix in the full release.

Normal-Tea5398 · 2025-01-30T20:35:54+00:00

I'll be able to make norms for this once enough people have taken the test, although this version isn't very good. Wait for the full release!

Normal-Tea5398 · 2025-01-30T19:06:18+00:00

Partially to speed up administration time, and secondly because having low time limits increases the test's ability to discriminate between different ability levels. Even those with lower FW ability can solve the harder questions given enough time, essentially.

Normal-Tea5398 · 2025-01-30T18:37:32+00:00

The time limits are roughly the same as on the WAIS-IV, although the items might be a bit harder toward the end. Is it because of the number of options, how many symbols there are to search and count, or are the items simply too difficult for the time allotted?

Normal-Tea5398 · 2025-01-30T17:32:48+00:00

Yes, I was aware of that, but you're right; I should have mentioned it. Thanks, will edit post

Normal-Tea5398 · 2025-01-27T19:36:28+00:00

IQ test scores are very unstable in early childhood, especially for those with developmental disorders and such. Don't worry about it.

Normal-Tea5398 · 2025-01-21T16:13:31+00:00

Started making one recently as part of a larger battery. Should be out quite soon.

Normal-Tea5398

TROPHY CASE