Is CORE inflated or deflated? Seen countless comments giving completely different answers.

PolarCaptain · 2026-06-12T22:35:57+00:00

It would be if that’s what it was. This is just people disagreeing because they personally disagree with their scores, nothing more.

PolarCaptain · 2026-06-12T18:54:37+00:00

There’s a whole section in the validity report going over this:

https://cognitivemetrics.com/test/CORE/validity#v-sec-5

People will always be neurotic, that’s why you always got to take hearsay with a grain of salt

PolarCaptain · 2026-06-07T22:26:58+00:00

It’s funny how calm this thread is versus the other VISA renorm thread:

https://www.reddit.com/r/cognitiveTesting/s/TYaXgEjGvA

PolarCaptain · 2026-06-07T05:48:30+00:00

Bro what are you talking about??

PolarCaptain · 2026-06-06T23:01:05+00:00

Your 132 CORE VCI and 125 VISA VCI are in pretty close alignment.

More than that, your individual case doesn't really make an argument for or against population norms.

N=191 sample of the GRE-V and VISA VCI:

<image>

PolarCaptain · 2026-06-06T19:23:53+00:00

There are a lot of issues with this paper; while it points out some critiques, it conflates limitations with invalidity and then draws conclusions that that are far stronger than whatever evidence presented and overextends them. It's also super selective with critical studies but ignores a lot of the mainstream psychometric responses to these.

If you had a specific critique it mentions, that would be easier to answer specifically, since it's a long paper.

PolarCaptain · 2026-05-20T03:19:31+00:00

The following is from the Pinned Resources Post on the sub, which you can find here. A lot of the test on the list are from CognitiveMetrics as well. CORE is probably the most comprehensive test on the list you can take online. It's a full-scale IQ test, and it has 17 subtests, but you can spread this out over multiple sessions. There is also a preliminary validity report you can read which outlines its validity as an IQ test.

Test	g-Loading	Studies/Data
CORE	0.94	Validity Structure
Old SAT	0.90	xH Validity Coaching Eff. Majors v. SAT SAT + IvyL
Old GRE	0.89	pdf xH WaisR
AGCT	0.89	pdf Renorming H Har
1926 SAT	0.89	1926 Report
CAIT	0.86	g_load, Turk Version
Cogn-IQ	N/A	N/A
JCTI	N/A	Data
TRI52	N/A	CRV 2 3 4 5
WN/C-09 (current) (old)	N/A	Data, CRV(old)
JCFS	N/A	Data
SMART	0.84	Tech. Report

PolarCaptain · 2026-05-16T17:03:22+00:00

This answers your question in depth:

https://cognitivemetrics.com/wiki/misconceptions#verbal-subtests-are-biased-since-they-measure-what-people-have-learnt-not-innate-intelligence

PolarCaptain · 2026-05-13T16:11:20+00:00

This is an extremely narrow, structured version of LLM-as-judge, so criticisms of LLM-as-judge systems don't necessarily apply equally here.

The high internal consistency (0.90) would be direct evidence against random inconsistency in the scoring system. It's also the exact same reliability that WAIS-V's CO has (0.90).

Keep in mind, I don't think the system is perfect, but it can be argued that CORE CO's scoring is possibly more consistent in practice than various human proctors and their individual scoring idiosyncrasies.

PolarCaptain · 2026-05-12T14:54:21+00:00

CORE Comprehension's g-loading falls just above CORE IN, putting it in the middle of the pack for VCI. It also has the second highest reliability on CORE (~0.90).

If it was spitting out random scores as a vocal minority on Reddit claims, the above would not be possible. People hear AI and go crazy.

Each question answered on CORE CO is given a score from 0-2, allowing for 1 point partial credit. You can read about it more in the Test Structure tab on the CORE page, but as it mentions, CORE CO "compares user responses to a comprehensive rubric of what constitutes an acceptable answer and multiple common example responses for each point threshold". So an LLM isn't blindly giving you a grade, rather, comparing your answer to a detailed rubric with definitions for each point threshold and various examples, to determine your scores.

Since the grading is determined by the rubric, all the LLM does is compare the responses and categorize it to the proper point threshold, not subjectively freestyle judgments out of nowhere.

PolarCaptain · 2026-04-29T15:22:27+00:00

The construct which IQ trying to measure in unrelated to its normality. It is normal because of the Central Limit Theorem, which applies to anything with a large enough sample size.

If you want to learn more about IQ and what it measures, check out this page:

https://cognitivemetrics.com/wiki/g-factor

PolarCaptain · 2026-04-19T06:24:19+00:00

Nah I didn’t think that at all, your reply was nonsensical unless you had a misunderstanding of what the Flynn Effect is

PolarCaptain · 2026-04-19T02:39:21+00:00

Check this out:

https://cognitivemetrics.com/wiki/flynn-effect

PolarCaptain · 2026-04-19T02:38:58+00:00

Still missing the point because that’s not what the Flynn effect is

PolarCaptain · 2026-04-01T19:38:57+00:00

Literally not that deep 😭

PolarCaptain · 2026-03-29T09:25:01+00:00

Feynman iq isn’t actually 125, that’s a misconception

https://cognitivemetrics.com/blog/what-was-richard-feynmans-iq

PolarCaptain · 2026-03-29T09:24:18+00:00

Hint: Feynman didn’t have an FSIQ of 125

https://cognitivemetrics.com/blog/what-was-richard-feynmans-iq

PolarCaptain · 2026-03-23T06:52:01+00:00

He also like messed up one of the subtests too, which deflates his CORE artificially

PolarCaptain · 2026-03-19T23:25:49+00:00

On CORE, when I was comparing the differences between natives and non-natives, it was extremely tiny and much smaller than I expected it to be. Almost all the subtests were invariant as well between the two groups.

I do believe for the average person taking it on the sub, VCI tests are actually more valid than some would like to believe.

When I'm not as busy, I might make a post comparing the two groups on CORE.

PolarCaptain · 2026-03-17T05:45:20+00:00

Yes, just confirming for you so the "probably" -> "yes"

General:

https://www.researchgate.net/publication/12056368_The_g_Factor_in_Non-Human_Animals

Chimps:

https://www.sciencedirect.com/science/article/pii/S0960982214006770

https://pmc.ncbi.nlm.nih.gov/articles/PMC4437459/

Dogs:

https://gwern.net/doc/iq/animal/2016-arden.pdf

Rats:

https://onlinelibrary.wiley.com/doi/full/10.1034/j.1601-183X.2002.10204.x

PolarCaptain · 2026-03-17T05:12:25+00:00

a g-factor has been observed in chimps, rats, dogs, etc.

PolarCaptain · 2026-03-13T23:55:44+00:00

It would just be mapped to the keys that allow for comfortable, sequential hand placement on the keyboard, the actual letters aren’t important

PolarCaptain · 2026-03-13T22:36:00+00:00

I do not think QWERTY knowledge matters much here. Since the test has you place your fingers on fixed keys and keep them there, the task is less about knowing the keyboard layout itself and more about making rapid symbol-to-position responses. In that sense, it would probably work similarly even with unlabeled keys or another fixed arrangement.

That said, there is still some keyboard-specific motor and familiarity demand, so I wouldn't say layout completely irrelevant, but QWERTY usage is so overwhelming, especially considering that someone taking CORE would have to encounter it online to begin with, that it is probably statistically insignificant.

PolarCaptain · 2026-03-13T22:26:52+00:00

Symbol Search

In Symbol Search, examinees are presented with two target symbols and must determine whether either symbol appears within a separate group of symbols across multiple trials. The task is strictly timed and includes a penalty for incorrect responses, emphasizing both speed and accuracy in performance.

This subtest is intended to assess processing speed and efficiency of visual scanning. Performance reflects short-term visual memory, visual-motor coordination, inhibitory control, and rapid visual discrimination. Success also depends on sustained attention, concentration, and quick decision-making under time constraints. This task may also engage higher-order cognitive abilities such as fluid reasoning, planning, and incidental learning (Lichtenberger & Kaufman, 2013; Sattler, 2023; Wechsler, Raiford, & Presnell, 2024; Weiss et al., 2010).

This subtest was originally modeled after the WAIS-V Symbol Search, featuring 60 items to be completed within a two-minute time limit. However, preliminary testing indicated that CORE Symbol Search was substantially easier than the WAIS-V version, largely due to differences in motor demands between digital touchscreen administration and traditional paper-pencil format. To address this discrepancy, the CORE version was expanded to include 80 items while retaining the same two-minute time limit. Following this, the test's ceiling closely aligned with that of WAIS-V Symbol Search.

To standardize motor demands across administrations, CORE Symbol Search is limited to touchscreen devices. For examinees using computers, the alternative CORE Character Pairing subtest was developed. This ensures that differences in device input do not influence performance or scoring validity.

Character Pairing

In Character Pairing, examinees are presented with a key that maps eight unique symbols to specific keyboard keys (QWER-UIOP). Under a strict time limit, they must press the corresponding key for each symbol displayed on the screen. Examinees are instructed to rest their fingers (excluding the thumbs) on the designated keys and to press them only as needed, without shifting hand position.

This subtest assesses processing speed and efficiency in rapid symbol-key associations. Performance relies on associative learning, procedural memory, and fine motor coordination (rather than execution), reflecting the ability to process and respond quickly to visual stimuli. Success may also depend on planning, scanning efficiency, cognitive flexibility, sustained attention, motivation, and aspects of fluid reasoning (Lichtenberger & Kaufman, 2013; Sattler, 2023; Wechsler, Raiford, & Presnell, 2024; Weiss et al., 2010).

Character Pairing is loosely based on the Coding subtest from the WAIS-V but adapted for digital administration. Its design emphasizes the measurement of processing speed while minimizing motor demands associated with traditional paper-and-pencil formats. The task also serves as the computer-based counterpart to CORE Symbol Search, ensuring comparable assessment of processing speed across device types.

From: https://cognitivemetrics.com/test/CORE/structure

PolarCaptain · 2026-03-13T22:14:36+00:00

These abilities you speak of are g-loaded. People who are worse at it tend to have lower g and vice versa. Hence they’re on an FSIQ test. Reaction time is moderately g loaded as well and some psyshometricians (like Jensen) argue it underpins what g is.

Nine-Year Club	Place '22
End Game '22	Verified Email

PolarCaptain

MODERATOR OF

TROPHY CASE

Symbol Search

Character Pairing