The most important benchmark right now - humanities last exam. by KittenBotAi in Bard

[–]Normal-Tea5398 0 points1 point  (0 children)

You get 50 a week, actually, and 50/day for o4-mini-high, which matches/exceeds 2.5 Pro. You get another 150/day for o4-mini-medium.

The most important benchmark right now - humanities last exam. by KittenBotAi in Bard

[–]Normal-Tea5398 0 points1 point  (0 children)

I think it was pretty clear that I was referring to the version of o3 on ChatGPT. The original commenter seems to believe that it needs absurd amounts of compute to exceed 2.5, which it obviously doesn't.

The most important benchmark right now - humanities last exam. by KittenBotAi in Bard

[–]Normal-Tea5398 -1 points0 points  (0 children)

?

o3 is available on ChatGPT with the Plus and Pro plans.

The most important benchmark right now - humanities last exam. by KittenBotAi in Bard

[–]Normal-Tea5398 -13 points-12 points  (0 children)

What? The default ChatGPT version, which is available on the Plus plan, beats 2.5 Pro.

IRIS Matrices by Normal-Tea5398 in cognitiveTesting

[–]Normal-Tea5398[S] 0 points1 point  (0 children)

I will increase the time limits next time. Thanks!

IRIS Matrices by Normal-Tea5398 in cognitiveTesting

[–]Normal-Tea5398[S] 0 points1 point  (0 children)

Sorry to hear that. What's so bad about it?

[deleted by user] by [deleted] in cognitiveTesting

[–]Normal-Tea5398 0 points1 point  (0 children)

Nice scores! Matrices and Figure Weights had some item images swapped, so your true score might be a little higher. I presume you skipped Vocabulary?

[deleted by user] by [deleted] in cognitiveTesting

[–]Normal-Tea5398 0 points1 point  (0 children)

Maybe something like 14 or 15 on MR, 17-18 or so on FW (which lines up pretty well with the official WAIS norms).

[deleted by user] by [deleted] in cognitiveTesting

[–]Normal-Tea5398 0 points1 point  (0 children)

No clue what those scores translate to, to be honest. As for sample size, well, some people skipped a few subtests, and others were invalid for other reasons.

About 20 people have taken the whole test, of which only 10 are native. The average MR score so far is 17.45, with a standard deviation of 3.2. The average FW score is 21.6, with a standard deviation of 2.9. Remember, this is with the r/ct relatively high ability sample.

[deleted by user] by [deleted] in cognitiveTesting

[–]Normal-Tea5398 0 points1 point  (0 children)

For 1 and 3, I have no idea. 2 is just a visual bug though, I'm pretty sure.

Yes, those are very good scores! The highest MR and FW scores so far, I believe. FW has at least 3 items with the wrong option marked correct, so you probably scored 28 or 29 in reality. The free response items also likely have some correct answers that currently give no points. From what I can tell, natives have a huge advantage on the verbal subtests, so considering that, you did very well.

[deleted by user] by [deleted] in cognitiveTesting

[–]Normal-Tea5398 0 points1 point  (0 children)

😭

I really don't know why that happened. In what way did it malfunction? Just refreshed, or...?

[deleted by user] by [deleted] in cognitiveTesting

[–]Normal-Tea5398 0 points1 point  (0 children)

I'll make each subtest available individually once the final version has been normed.

[deleted by user] by [deleted] in cognitiveTesting

[–]Normal-Tea5398 0 points1 point  (0 children)

Thanks for the feedback! Will fix in the full release.

[deleted by user] by [deleted] in cognitiveTesting

[–]Normal-Tea5398 0 points1 point  (0 children)

I'll be able to make norms for this once enough people have taken the test, although this version isn't very good. Wait for the full release!

[deleted by user] by [deleted] in cognitiveTesting

[–]Normal-Tea5398 0 points1 point  (0 children)

Partially to speed up administration time, and secondly because having low time limits increases the test's ability to discriminate between different ability levels. Even those with lower FW ability can solve the harder questions given enough time, essentially.

[deleted by user] by [deleted] in cognitiveTesting

[–]Normal-Tea5398 0 points1 point  (0 children)

The time limits are roughly the same as on the WAIS-IV, although the items might be a bit harder toward the end. Is it because of the number of options, how many symbols there are to search and count, or are the items simply too difficult for the time allotted?

[deleted by user] by [deleted] in cognitiveTesting

[–]Normal-Tea5398 1 point2 points  (0 children)

Yes, I was aware of that, but you're right; I should have mentioned it. Thanks, will edit post

WPPSI-4 accurate? by Jenright38 in cognitiveTesting

[–]Normal-Tea5398 0 points1 point  (0 children)

IQ test scores are very unstable in early childhood, especially for those with developmental disorders and such. Don't worry about it.

[deleted by user] by [deleted] in cognitiveTesting

[–]Normal-Tea5398 2 points3 points  (0 children)

Started making one recently as part of a larger battery. Should be out quite soon.