On High-Range Test Construction 1: Antjuan Finch on PDIT & GIQ
Author(s): Scott Douglas Jacobsen
Publication (Outlet/Website): Phenomenon
Publication Date (yyyy/mm/dd): 2024/08/16
Author: Scott Douglas Jacobsen Abstract
Antjuan Finch is the Author of After Genius: On Creativity and Its Consequences, The 3 Sides of Man, and Applied Theory. He created the Creative Attitudes Inventory (CAT) and the Public Domain Intelligence Test (PDIT). Finch discusses: Public Domain Intelligence Test (PDIT) the Static General Intelligence Quicktest (GIQ).
Scott Douglas Jacobsen: This series will be exploratory, taking note of some of the people’s resources in the high- range test environment and then presenting this for public consumption. You developed the Public Domain Intelligence Test (PDIT) and the Static General Intelligence Quicktest (GIQ). Naturally, there is a start for everything in high-range test development. What was the origin of the idea for developing high-range tests by you?
AntJuan Finch: Rather than starting with the focus of developing high-range IQ tests, I simply observed the available offerings for free IQ tests online and thought that there could be an opportunity to create something with more easily identifiable backing in existing research. To that end, the Public Domain Intelligence Test was created: a free intelligence test constructed using open-source, and previously validated items from elsewhere. As I suggested, I knew that there was an opening in the free cognitive assessment space for such a product to be made, but I was actually surprised when it garnered so much attention, now having over 40,000 users. The Static General Intelligence Quicktest was borne from a similar impulse: I’d noticed that most comprehensive intelligence tests could be dramatically shortened without sacrificing nearly any construct validity, and really an entirely negligible amount of measurement accuracy. And so I set about creating a test that would maximize convergent validity with full-length intelligence tests, delivered in roughly the shortest amount of time conceivable, with also the added bonus of being constructed in a way that I could generate an infinite amount of parallel versions of the test to buttress against cheating (more on this later).
Jacobsen: What tests stood out in your early thoughts?
Finch: I focused most strongly on tests with a diversity in item types, and on shortened versions of longer tests.
Jacobsen: How did those tests form a template, if at all, for the PDIT and the GIQ?
Finch: The PDIT was my best attempt at making a mirror – using open-source science – of a common abbreviated WAIS form, the WASI-2. To that end, I just wanted sources of VCI and PRI proxies; in other words, good vocabulary/verbal and reasoning/non-verbal item sets. Meanwhile, my rules for the Static Quicktest were a bit less constrained: as long as it was reliable and correlated well with comprehensive tests in general, I was free to just think up all of the item types for the test. Nonetheless, to maximize g-loading, I ended up roughly paralleling the weighting structure of the WAIS-IV, with 80% of the final score being from Verbal and Non-Verbal items and the last 20% being from items that just rely on rote computation, rather than pure reason or knowledge. From there, I decided to break out the Crystallized, Fluid, and Cognitive Storage and Efficiency constructs into their iconic, or often- referenced constituent parts. For example, crystallized intelligence, referring to one’s ability to assimilate learned information, is often thought and shown to be assessed well by tests of vocabulary development, reading comprehension, and grammatical sensitivity. In fact, I picked SAT cloze items for the Public Domain Intelligence Test (PDIT) precisely because that item type has been shown to measure each of those facets well. Likewise, the reasoning aspect of Fluid Reasoning can be separated into the classic split between eductive, deductive, and inductive reasoning; I picked the Non-verbal matrices for PDIT because that item type has also been shown to reasonably tap each of those facets. From there, I selected pure – or, as pure as could reasonably be found or currently made – items that reflected each of those facets: items separately for vocabulary acquisition, reading comprehension, grammatical sensitivity, eductive, deductive and inductive reasoning, and each with further nuances between the questions within each set. The same reasoning was applied with the Cognitive Storage and Efficiency items. This all took about a day; the freedom of not having to rely on open-source and preexisting materials made the process go much quicker for the Quicktest than for the Public Domain Intelligence Test.
Jacobsen: To quote the GIQ introductory content in full:
Originally designed such that thousands of forms of the test may be produced, allowing for retakes to be more validly performed in quick succession, and a bolster against cheating, this static version of the test was designed to mirror the content of the WAIS-IV, using the formatting of the Wonderlic Personnel Test. Put simply, the test assesses the full spectrum of psychometric g, using cutting-edge theory, combined with a well-tested format.
To do this, the test assesses 3 factors: Crystallized Intelligence, Fluid Reasoning, and Cognitive Processing and Efficiency, using 8 item types.
This test has 50 items and takes 12-minutes to complete. Click here to begin.
How is this test adaptable and resistant to cheating? Chris Cole of the Mega Society has been working with others on a cheat resistant test, too. One that is adaptive.
Finch: The items of the test are constructed in a way such that equally valid, yet alternate and completely new versions of the test could be algorithmically generated by a machine; in which case, memorizing or practicing the test currently displayed on that website wouldn’t really help anyone to hack or game the test.
Jacobsen: WAIS is referenced as the gold standard in academic work. Is this relevant when developing a test that taps into g?
Finch: Yes, if a test is to be useful as and understood as a measure of IQ, its results ought to be easily interchangeable with the results of commonly used, or what are typically considered by professionals as good IQ tests. Put another way, the test should maximize for the g across tests of g; it should load primarily on the results from tests which are each comprised of a diverse set of cognitive tests (g).
Jacobsen: Why use the formatting of the Wonderlic Personnel Test?
Finch: It seems intuitive that if I took one cognitive test one day, and then took a totally different type of cognitive test 40 years from now, that the results will most likely be less correlated than if I took them minutes apart. And so I had a theory that part of why tests like the Wonderlic Personnel Test, and even more so, the TOGRA, maintain results that are so well correlated with more comprehensive assessments is that the quality that accounts for results across cognitive tests gets a bit more tapped when the tests are done in quick succession, or, even more so, when you cycle through the items from each of the sections over and over again, as which happens with the Wonderlic and TOGRA. To summarize, I thought that putting the subtests into one quickshot form might further amplify convergent validity, and I knew that it could be possible to do that and not sacrifice much reliability while doing so when also bringing the time length of the test all the way down to 12 minutes.
Jacobsen: Are there any areas in which the WAIS-V taps into a wider definition of g not used in the GIQ when it is using the WAIS-IV as its structure to mirror?
Finch: To be determined. Though, I didn’t mirror the WAIS- IV’s content exactly, only its construct weighting; in fact, due to its algorithmic escalation and facet focus at the third-stratum, conceptual level, it could even test a construct that’s broader than the WAIS-IV’s.
Jacobsen: Why are 8 item types the standard?
Finch: To ensure that you’re testing the quality that’s general across cognitive tests, you want to make sure that your results are generalizing across multiple types of items. The easiest way to do that is to just put a diverse set of items in your test.
Jacobsen: How are crystallized intelligence, fluid reasoning, and cognitive processing and efficiency brought together in the GIQ?
Finch: I believe I answered this well enough earlier.
Jacobsen: How do we know they are well-balanced in the assessment of g in this particular test?
Finch: This was also answered well previously: I tested the third-stratum level concepts first and then weighted second-order facets the same as the second order factors are for the WAIS-IV.
Jacobsen: How do you ensure this is the case?
Finch: At the end of the day, and this goes beyond my previous answers, if it wasn’t done well enough then it wouldn’t correlate so strongly with the results across professional tests for intelligence.
Jacobsen: To quote the PDIT in full:
Verbal (Gc) Test
Crystallized Intelligence (Gc) refers to one’s ability to use acquired knowledge to solve problems. Because crystallized intelligence deals with learned information, Gc increases with age and educational attainment and can be tested well by assessments of verbal ability, such as vocabulary and cloze tests. What’s more, the items in this test were pulled from publicly accessible, old SATs (Scholastic Aptitude Tests), so this assessment should provide a near-perfect measure of crystallized intelligence. Moreover, the SATs that this test was derived from are considered valid measures of intelligence and were accepted for admission purposes to many high IQ societies, including the International High IQ Society and Triple Nine Society.
To answer each question, test-takes must select the option which best completes each sentence. An example would be selecting “gradual” to complete the sentence “Medieval kingdoms did not become constitutional republics overnight; on the contrary, the change was ——-.”
This test has 30 questions and a 15-minute time limit. The questions are ordered from least to most difficult. For an accurate score, do not use any aids to complete this test, and take it only once.
Non-verbal (Gf) Test
Fluid Intelligence (Gf) refers to one’s ability to recognize patterns within, and make sense of, novel information. Because fluid intelligence deals with novelty, it can be tested well by assessments of reasoning ability which are comprised of non-verbal, foreign, and abstract items. Moreover, for unknown reasons, fluid intelligence tends to increase until early adulthood (the mid-20s to early 30s), and decline precipitously until death. What’s more, the norm for this test was extrapolated from the results of 705 teenagers and young adults, so relatively older people may receive seemingly deflated scores on this test, as the scores here are not age-adjusted.
To answer the questions on this test, test takers should select the options that complete the patterns that are presented to them.
This test has 30 questions and a 15 minute time limit. The questions here span a wide range of difficulty and complexity and are placed in a pseudo-random order. For an accurate score, do not use any aids to complete this test, and take it only once.
Obviously, this test is more involved. The interesting part is the separation between the verbal and the non-verbal content, Gc versus Gf. What is a cloze test?
Finch: It’s not more involved; it only takes longer to complete. That separation may well be informative for many people, but the g-loading for that test is undoubtedly lower than the SGIQs because it merely has two item types, in this case. A cloze test is a sentence completion test where a sentence is missing parts and one is tasked with filling in the missing blank(s) with the most fitting answer available.
Jacobsen: How have the new SATs done to measure general intelligence? Are the old SATs better at measuring general intelligence? What is the year separating new and old in this definition of the SATs?
Finch: It appears that the old SAT probably tested a broader set of items and most likely did so in a broader set of ways, although I don’t believe that there was an overly clean cut- off in when this happened, but that it was more of a gradual thing. Nonetheless, these tests were made to predict academic performance, and in doing so, can’t escape testing crystallized intelligence, and in doing that, won’t escape either the ineliminable part of crystallized intelligence that loads with fluid intelligence, and thus leads to a modest g-loading for the test overall. One has to sacrifice a moderate amount of the variance in the test, but
the results on the new SAT can be converted to reasonable IQ results. Maybe unsurprisingly, the results for many standardized tests used for admissions to colleges and graduate schools are actually extremely highly correlated, and so concordance tables are somewhat easily produced for all of them. Once that is done, and once you also have the IQ conversions for a few of these tests, you can then without much added work convert the scores on all of them into IQ approximations, as I’ve shown here. I actually find the results of this all to be pretty fascinating; you can take that table and predict the IQ averages for universities that have been documented in peer-reviewed research. Although I should add that some people might think that the results on that table look far too low, but I believe that’s only because so many people have been lied to about what the results may be for others that often only experts have more come to really understand what a well-motivated and well-trained 135+ IQ person actually looks like. Moreover, much of what is going on with a lot of tests beyond that point is experimental, and is not associated with much output that most would view as impressive, due to other somewhat beneficial traits starting to become improbable to coexist with yet another outlier trait.
Jacobsen: Why zone in on 30 questions and 15 minutes for each test? Do the same time limit and question ceilings necessarily measure their respective components of intelligence to the same graduated degree?
Finch: Not necessarily, that parallel was mostly a stylistic consideration. That both the verbal and nonverbal sections are also made from “fill in the blank” type questions was also somewhat of stylistic detail; I thought that a bit of symmetry and parsimony in appearance wouldn’t hurt.
Jacobsen: What is the evidence for the curvature of increase, stabilization, and decline of components of intelligence? These seem obvious and are common knowledge. I want to make everything explicit for educational purposes and reminders. Maybe, a renewed statement of truism in a new way can give a new insight too.
Finch: This is one of the most well-established findings regarding the study of cognition. For an easily readable and very hard to reasonably rebut study on this topic one should read the paper “IQ and Ability Across the Adult Lifespan,” which looks at the raw scores for the WAIS-IV for each age group in its manual and finds that the average 64- year-old suffers the equivalent of about a 30 point loss in processing speed throughout their life.
Jacobsen: Why select a pseudo-random order rather than a completely random order or a logically progressed order?
Finch: I preserved the order of the items from the study that first validated them, which I did not conduct.
Jacobsen: Where did the sample of 705 people come from, for the test?
Finch: Being a test consisting of previously validated, open- source content, the 705 participants Non-Verbal section came from the initial sample that was used to validate the items in the research that was conducted prior to my using the items for a more general assessment, as well as with additional samples.
Jacobsen: How could you age-adjust the scores, if at all?
Finch: I would just need a few more participant samples. Though, I’m not so interested in doing that as I believe that doing so would make the results less informative, or at least more confusing.
License
In-Sight Publishing by Scott Douglas Jacobsen is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Based on a work at www.in-sightpublishing.com.
Copyright
© Scott Douglas Jacobsen and In-Sight Publishing 2012-Present. Unauthorized use and/or duplication of this material without express and written permission from this site’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Scott Douglas Jacobsen and In-Sight Publishing with appropriate and specific direction to the original content. All interviewees and authors co-copyright their material and may disseminate for their independent purposes.
