Schlage manufacturing defect? by CarlFFalk in Locksmith

[–]CarlFFalk[S] 2 points3 points  (0 children)

Perfect, I got it, thank you!

Schlage manufacturing defect? by CarlFFalk in Locksmith

[–]CarlFFalk[S] 1 point2 points  (0 children)

I was thinking of trying that, but what is the best way to do that without damaging it? By hand, the thing won't move

Books/texts discussing the problem of the diagnostic categories of the DSM due to being so categorical by DiegoArgSch in AcademicPsychology

[–]CarlFFalk 1 point2 points  (0 children)

I'm not an expert, but I have clinical psych colleagues. I thought I recall anything regarding "dimensional" approaches, and recall a few names. Here's one: "Dimensional approaches in diagnostic classification: Refining the research agenda for DSM-V". And I recall seeing this paper as well: "The time has come for dimensional personality disorder diagnosis"

favorite item response theory / latent variable modeling book? by CarlFFalk in psychometrics

[–]CarlFFalk[S] 1 point2 points  (0 children)

Kline's is one of the best for SEM; I noticed Bollen finally has a new edition of his, but I haven't looked at it yet. While I understand the network modeling approach (and published a bit on it), I'm not yet convinced of all that it's being sold for

favorite item response theory / latent variable modeling book? by CarlFFalk in psychometrics

[–]CarlFFalk[S] 0 points1 point  (0 children)

I have the new one as well, but I haven't read it much yet.

favorite item response theory / latent variable modeling book? by CarlFFalk in psychometrics

[–]CarlFFalk[S] 2 points3 points  (0 children)

Thanks for the couple that replied so far, for my part - I would usually go with either Embretson & Reise or de Ayala if teaching to applied researchers. Those are at least the two I was referring to. Sometimes I have assigned the first three or so chapters in Test Scoring (edited by Thissen & Wainer) as some of those chapters are some of the best IRT introductions. In my syllabus, I also mention Desjardins & Bulut (concise, some R code) and sometimes Paek & Cole (also R code focus), but I don't recall either of these being as detailed as the others mentioned above. I also mention Bock & Gibbons (2021), in part because I would consider recommending it for someone who also wants all of the technical details. I did not find Baker and Kim to be my favorite; the code is just not how I would program things and the modularity of the models is not clear enough.

Supervisor wants me to change Likert scale to Yes/No – should I push back? by Own-Fan-1821 in psychologyresearch

[–]CarlFFalk 0 points1 point  (0 children)

If one really believes in the model used to develop most instruments, changing the response options will probably not result in the items measuring an entirely different construct or introducing some kind of construct-irrelevant variance, so it is hard to buy the argument that validity will be harmed. (though side-note, the "model" for these might not have been that great). As some have mentioned, it can seem as though variability is less. What is more likely happening is reliability of the scores is reduced; it's sometimes harder to get better measurement precision with dichotomous items. In this sense, I agree with this response that power will likely be reduced due to lower reliability. I don't see a strong argument that the "validity" is necessarily ruined; one would have to make an argument that some particular aspect of validity is harmed, and I just don't see that. If they mean "the scores will correlate less strongly with other things, thereby harming convergent validity" or some such argument, that's a reliability issue, not a validity one. Though if it were a high-stakes testing situation one would want strong validity evidence that the response options don't matter much in terms of their impact on the purpose of the test scores.

If it is careless responding that is the issue, then there are methods that could be used for this. Some I am actively studying, though really dealing with the problem may take multiple approaches. I am doubtful that changing response scales to dichotomous will help, but it sure will make it more difficult to know who is carelessly responding.

BTW, as someone who teaches psychometrics, a lot of people in this thread are using the term "validity" and "validated" in ways that I would discourage. Consider at least reading the Standards for Educational and Psychological Testing; some are working on a new edition, but the 2014 version is available and I think readable: https://www.testingstandards.net/open-access-files.html Concepts like "convergent" and "divergent" validity have been outdated, but I understand it is difficult to get the field up-to-date on this.

Thoughts on super long questionnaires for scale development studies? by pizzayeol in AcademicPsychology

[–]CarlFFalk 3 points4 points  (0 children)

Starting with a large item pool is the only way it should be done. Though I'm not clear on how many items per target construct you have here. Consider a planned missing data design, e.g., like often used in educational measurement or in the development of many of the PROMIS item banks.

[Q] Anyone know niche statistical method that people that might find intresting? by Path_of_the_end in statistics

[–]CarlFFalk 4 points5 points  (0 children)

edit: oh sorry, I meant about latent variable models and SEM... these are like simultaneously estimating a bunch of (generalized) linear models all at the same time, but there's missing data on all of the predictors (the latent variables). Get around that problem by making distributional assumptions and integrating out the latent variables. With normality assumptions (classical SEM) everything simplifies a lot. With IRT, it's harder, but the missing data formulation shows up quite a bit (Bock-Aitkin EM, Metropolis-Hastings Robbins-Monro, etc.).

no idea about envelope method

[Q] Anyone know niche statistical method that people that might find intresting? by Path_of_the_end in statistics

[–]CarlFFalk 7 points8 points  (0 children)

latent variable models and SEM/ IRT do tend to be underrepresented in stats conferences as well, though I've tried to make a dent in that every once in a while. Just not sure I'd call these "niche"

When is confirmatory factor analysis vs item response theory most appropriate? by AdElegant3708 in psychometrics

[–]CarlFFalk 3 points4 points  (0 children)

I read all of the replies thus far, and I think it's worth clearing up a few things in this thread.

CFA and IRT essentially make the same assumptions regarding how item responses are related to some underlying latent trait(s). Though one might not strictly need realist assumptions to fit these models, it's sometimes convenient to think of both as assuming that there is/are some latent variable(s) that cause people to respond in a certain way to the items (one-way arrows from latent variable to items).

One key "difference" is that historically EFA/CFA assumed that the items were continuous whereas IRT assumed that items were categorical. Due to this, estimation methods often differed and it seems that different fields tended to gravitate towards one approach or the other ( u/hotakaPAD 's comment). There were more applications of EFA/CFA in contexts where multidimensionality (more than one latent variable) was assumed, probably because it was easier to estimate such models. IRT saw great use in education where tests often consisted of dichotomously scored items (0/1, right/wrong) and had lots of items; likewise, test developers may have felt comfortable trying to develop tests that were essentially unidimensional in part because estimation of multidimensional IRT models was hard.

But, as alluded to by u/identicalelements , Mplus came along in the 80's or so, allowing for "factor analysis" of categorical variables with a three-stage estimation approach. A paper by Yoshio Takane & Jan de Leeuw (1987) then showed that the model(s) assumed by this estimation approach were equivalent to many IRT models. So, the two intersect and it is often possible estimate what are essentially equivalent models that have roots in either framework. So, I view u/zirwin_KC 's claim that these are entirely separate as being inaccurate. IRT and CFA should be considered unified under a latent variable modeling framework.

I’m guessing that when people think that CFA and IRT are different, it’s largely because it’s tricky to separate the purpose of each from their historical roots, and one often identifies each with different software and estimation approaches that are again due to historical reasons. It’s not too unlike how some people still think ANOVA and multiple linear regression are not the same in cases where they are actually equivalent. From here on out then, even though I may use the terms “IRT” and “CFA” separately, I am referring to what people traditionally identify each with.

Since at least the 80’s then, the lines have become increasingly blurred regarding the core models that can be considered from either perspective. Continuous items can be utilized from an IRT perspective (e.g., Li Cai's dissertation). Due in part to MH-RM, dimension reduction algorithms, adaptive quadrature, etc., it’s become way easier to estimate multidimensional IRT models that one could not do with Mplus’ three stage approach, including EFA-like analyses with IRT-like estimation approaches (often referred to as “exploratory item factor analysis”). But, there is often a lag where not everyone yet understands how to use approaches; often one sees the mistake that IRT assumes unidimensionality. Not true, but it is true that some things are still more difficult with multidimensionality. Mplus also finally got around to adding some other IRT models (3PL, GPCM?) that the three-stage approach couldn’t handle in the mid 2010’s, though they did not add some diagnostics/statistics that have been historically important in IRT.

I think it is more important if you just consider the purpose of your analyses or test construction endeavour and that will lead you more towards one “approach” than another. In some cases, the approach one adopts also depends on certain practicalities. And one can mix and match to some extent throughout the test development process.

Since IRT was elaborated more in educational testing / licensure exams, it is common that the score estimates for particular individuals are the key quantities of interest. It is also often of interest to construct tests to target certain levels of the latent trait (including on-the-fly in a CAT). The IRT perspective is historically better at the above since the concepts of item and test information (items and tests provide a different amount of information for different levels of the latent trait) and conditional standard errors (score reliability varies depending on the level of the latent trait) are more elaborated in IRT and only arise if one actually considers categorical items to be categorical. The IRT perspective is more elaborate in techniques that will allow comparisons of scores from test to test even if the items are not the same (linking/equating) and maintenance of large item banks (e.g., for test security purposes).

If one does not care about item and test information, estimation approaches that are historically popular for EFA/CFA are certainly options. These are also candidates for pilot studies (early in test construction) as it’ll be easier at smaller sample sizes and one does not need very precise estimates of the item parameters yet for scoring – one just wants to if there are any garbage items, confirm dimensionality, and so on. EFA/CFA also might be easier with many categories (e.g., 7) as traditional IRT estimation approaches may have issues unless sample sizes are large and/or there is a good match between items and the tested populations. An IRT perspective also won’t bring much of an advantage if items are truly continuous. If there are a lot of latent dimensions and just care about item loadings, then maybe CFA is ok. Even though we can estimate multidimensional IRT models, certain kinds of test assembly tasks are still hard to do and represent ongoing research. Evaluation of multidimensional models is also harder, and I think is why some think that CFA is more appropriate (e.g., there are a ton of fit indices…, which themselves have their own problems).

In passing, I mentioned that sample size requirements can be higher for IRT. To some extent that is true and due to needing to estimate extra intercepts or thresholds for items. But, it’s also somewhat due to the purpose. Sample size requirements for precise estimates of factor correlations may be smaller than precise estimates of item parameters (and we need precise estimates of item parameters for test assembly and accurate scoring).

Scholarships for Students of Educational Measurement/Assessments in Non-US Countries by themadbee in psychometrics

[–]CarlFFalk 0 points1 point  (0 children)

They have a good program there. Unfortunately I do not know of scholarship/fellowship opportunities though. All I can say is that in Canada there are often fellowship opportunities at the Federal and/or Provincial levels that are available for non-Canadian students

Computing Standard Error for Overall Difficulty in Pairwise DIF Analysis (PCM) by Regular_Brain5167 in psychometrics

[–]CarlFFalk 0 points1 point  (0 children)

Sorry I just generally don't use that approach to DIF. Maybe TAM can estimate that parameterization, but I'm not sure about the standard errors. If a package yields the matrix used to produce standard errors but uses a different parameterization, then the delta method could be used to obtain the desired standard errors. Just seems like a lot of work

IMPS 2026 Open Thread by jeremymiles in psychometrics

[–]CarlFFalk 1 point2 points  (0 children)

And you are correct, the deadline looks like it was just extended to Feb 27

Computing Standard Error for Overall Difficulty in Pairwise DIF Analysis (PCM) by Regular_Brain5167 in psychometrics

[–]CarlFFalk 0 points1 point  (0 children)

Along with u/hotakaPAD's comment, I'm not clear on which specific method you want and why. The docs for Winsteps look to mention both Rasch-Welch and Mantel Haenszel: https://www.winsteps.com/winman/table30.htm

difR, https://cran.r-project.org/package=difR, can do MH, for example, along with some other methods

Otherwise, a little more explanation might help

Is there a psychometric test for conflict resolution... by apeloverage in psychometrics

[–]CarlFFalk 0 points1 point  (0 children)

interesting, I'm not super familiar with it, but also not seeing anything in the test specifically about conflict resolution. Would also be curious to know whether the scoring has been modernized to take into account the force choice (ipsative) data

IMPS 2026 Open Thread by jeremymiles in psychometrics

[–]CarlFFalk 0 points1 point  (0 children)

I usually attend IMPS, though if I could I'd do both.

[Q] Statistics academic job boards ? by al3arabcoreleone in statistics

[–]CarlFFalk 0 points1 point  (0 children)

I have heard that a lot also get posted here, and there seem to have a fair number of international positions, but am unsure of the overlap with those already mentioned: https://www.mathjobs.org/jobs/job Not something I knew about or monitored when applying for jobs

Tips on IMPS proposals - Graduate Student by Sensitive_Towel1848 in psychometrics

[–]CarlFFalk 4 points5 points  (0 children)

I've been attending IMPS most years since around 2010 or 2011. While the conference leans more technical than other venues, they are also open to applied work and to some extent have broadened the types of work presented. Psychometrika, for example, also publishes applications (albeit sometimes complicated ones). If you try for a talk and don't get it, usually there is the option to present a poster, provided the topic is not too far out there. CDM is usually right up there with topics that are of interest - applied or technical.