Conversation with Michael Isom on New Test Developments and Old Tests: Member, World Genius Directory (2)
Author(s): Scott Douglas Jacobsen
Publication (Outlet/Website): In-Sight: Independent Interview-Based Journal
Publication Date (yyyy/mm/dd): 2022/04/15
Abstract
Michael Isom grew up in the birthplace of hip-hop, South Bronx New York, during its original emergence. Having also lived through its rise and urban renaissance of the mid-80s through the early 90s, Michael was able to experience many of the culture’s core lessons of true aboriginal history with respect to cultural identity, knowledge of self, responsibility through adherence to law, studiousness towards becoming the adept, and mastery of one’s being as thematic underpinnings of the rap music produced in that era. In later years after completing high school, he decided to pursue an undergraduate degree in Forensic Psychology and graduate education in Public Policy specializing in Management and Operations. Afterwards, he obtained an M.B.A. in Strategic Management in the wake of the dot-com era. In 2001, during the Super Bowl 35 Baltimore Ravens vs New York Giants intermission, Michael incidentally discovered what may have been the first online IQ test by the late Nathan Hasselbauer, founder of the New York High IQ Society, which soon after became the International High IQ Society. Having scored well past the 95th percentile requirement for entry, Michael was contacted years later by Victor Hingsberg of Canada, and was invited to take the test required to become a member of his newly established Canadian High IQ Society. After meeting its 98th percentile passing requirement and before moving on to TORR (99.86th percentile or 145 IQ requirement), Michael discovered what is undisputedly the most advanced cognitive assessment platform for IQ testing, in the world: IQExams.net. After a completing a battery of 40+ tests within a 1 1/2 year span of signing up, a clear picture of Michael’s scoring attributes emerged within the spatial, numerical, verbal, and mixed item logical areas, with a subsequent RIQ (Real IQ) calculation of 152. As his foray into the High Range Testing world continued, he happened to stumble upon a challenge issued by the ZEN High IQ Society: Two untimed IQ test submissions with a minimum IQ score of 156 (SD 15) are required for entry. And those submissions have to come from a pre-selected set of untimed high range tests. Since Michael already met half the requirements with his first attempt score on VAULT (163), he only needed one other test to qualify – hence Dr. Jason Betts’ test battery: Lux25, WIT, and Mathema are listed as accepted tests for Zen. Scoring 156 on Lux25 not only satisfied the entry requirement, but it also accompanied the rest of his scores on Betts’ test battery for a 151 TrueIQ. With the above experience, Michael decided to gain more exposure to other high range tests from other authors. After taking both the MACH and SPARK tests simultaneously (scoring 168 and 165 respectively on the first attempt), he proceeded towards a specific numerical test, GIFT Numerical III on which he scored 164. After also gaining entry into both the SATORI and TRIPlE4 High IQ Societies, he completed the untimed G.E.T. (Genius Entrance Test) mixed item test in minimal time. After receiving a final score of 162, he returned to IQExams.net and executed one of the most gifted performances on any tightly timed spatial IQ test he’s ever taken. His recent first attempt score of 160 on the incredibly challenging gFORCE IQ test exemplifies that cognitive fortitude can be taken to the brink, while spatial design and difficulty are taken to the next level. He discusses: newer test developers and old tests
Keywords: intelligence, IQ, IQ tests, Jason Betts, Michael Isom, Xavier Jouve, World Genius Directory.
Conversation with Michael Isom on New Test Developments and Old Tests: Member, World Genius Directory (2)
*Please see the references, footnotes, and citations, after the interview, respectively.*
Scott Douglas Jacobsen: Are there particularly newer test developers whose tests you’ve taken where people should be keeping an eye on for themselves? Others are more known because they are more prominent.
Michael Isom: There are a number of emerging test developers, more so in the verbal space. And even among the incumbent developers, it’s quickly learned that certain persons may not even take synonyms into account in the creation process.
This for example results in increased subjectivity risk, in which scoring consistency can be adversely affected. And such debates about objectivity control are also pervasive among different test types, (e.g., verbal, spatial, logical, numerical, mixed) where verbal usually has the highest abstraction risk of heightened subjectivity.
So how do you squelch or best minimize these irregularities? One of my favorite verbal tests ever created is the VAULT test, which contained very noted subjectivity control by keeping the terms very culture fair and more tangibly graphic or concrete. They weren’t too abstract, but still retained certain graduated increments of consistency over the exam’s progression.
In comparison to other tests, it’s still one of the very best verbal tests as a critical reference point, in my opinion. New test creators will expand testing for the high range into new areas. And will eventually innovate a divergent reach of the high range, in which consistent correlation of verified actions in the world will be used to support more detailed accuracy of high range test results.
High Range (HR) items tend to change often in comparison to their proctored equivalents. The initial tradeoff is the ability to test adaptation to item type change under pressure vs long-term data accrual of highly repetitive item types over move granular increments.
Perspectives about the rate of item type change on an IQ test are sharply delineated between high-range (HR) test creators and professional psychometricians, the latter of whom tend to make their items repeat very often. The same question types can appear ten or more times before changing even once. Many in the HR testing space have issues with such non-variability, and one of several reasons is that university-trained psychometricians are required to test the difficulty level increases and the calculi for that particular item type over very small incremental progressions.
So, they have to rigorously test minute gradients of scale in difficulty, whereas test developers in the HR space have a lot more freedom to change the item type. However, they’re accruing less data for any specific item type, as it changes more often – thus revealing a hidden trade-off that occurs, which brings me to the next point of standard and HR tests targeting very different components or areas of cognition. The commonplace response is “Okay, you still need to go to a licensed psychologist to get your IQ tested. You can’t do it using an HR testing scenario.”
I’ve often disputed it because the assertion incorrectly assumes that all dimensions of both an HR and standard IQ test are one and the same in exact precision. And while there’s considerable overlap, there are some very stark differences in terms of how they are steered, with each potentially having mutually exclusive aspects of cognitive targeting with measured components moving inversely to each other.
Standard IQ tests usually hinge much more on processing speed than their HR testing equivalents, for example. HR IQ tests, especially those that are untimed, hinge more on depth. So what you see, is this movement towards an inflection point of equilibrium, in which one cognitive dimension trades off against the other. In other words, speed trades off against depth.
The default test structure starts off favoring speed with simpler items. As the difficulty increases by item and type, depth becomes more important. However speed, at some point will most likely be sacrificed, in favor of depth, which in a substantial number of cases is arguably harder to measure. For example, at 140 and above, it’s very hard to tell the difference between a 140 and a 160 on most tests, due in part to being highly sensitive between the 85 to 115 range, indicating a possible breakdown as one gets closer to 130+.
At the 140 and above mark, you need the HR testing scenario to measure difficulty levels not normally associated with the general population. Even though a significant number of HR tests are timed, these are critically focused on processing speed in relation to the scale of hardness.
There are quite a few parallel debates going on that will actually affect the evolution of the HR IQ test over time. For example, some creators adhere to very strict logic paths on both multiple-choice and open-ended tests, timed and untimed. Others seek to measure targeted amalgams or interfaces of logic, imagination, and perception within the same or at times an even more advanced context.
But it appears to become more complex as the number of test creators arrive and grow the space. And it could be in part due to noted differences in the manner in which general intelligence is measured. One of the caveats concerns HR tests that seem to measure general intelligence (at differing grades), but may more so in fact be measuring highly congruent logic patterns between test author and test taker. Such parallels may still have questionable sampling applicability, especially if the N values with respect to Chronbach’s Alpha and Pearson R correlations are too low. In other words, large sample sizes are a supreme factor.
Dr. Xavier Jouve may have been the first to invent an online computer-adaptive format (JCTI, TRI-52) in matrix testing that attained a 90+% correlation with the WAIS Subtest for Matrix Reasoning. The sample size was around 300. As a matter of fact, he’s the only one to have administered the JCTI and the WAIS to the same populations. And from observation of anecdote, it appears that even though he previously sold (for a very minimal sum) IQ test certificates for JCTI results, the RIX (Reasoning Index) range more accurately applies only in comparison to the WAIS Subtest for Matrix Reasoning – not necessarily the overall of FSIQ (Full-Scale IQ). Several persons who’ve scored in the 130 range on the JCTI reported FSIQs of 140+ on the WAIS. And it’s not that such examples qualify as markers or guidelines for extrapolation, but similar patterns are well noted.
A recent question appeared on social media concerning how test-takers feel about certain spatial designs and their style in HR testing. But there’s an area of subjectivity where a testee can do very well on spatial tests from certain test designers but may have problems with those from certain others, because the visual orientation may be closer to modeled patterns of the former. So in a sense, it goes back to the overlap of (1) the affinity between designer and testee logic paths and (2) a more objective measure of general intelligence.
For example, I did unusually well on the MACH and SPARK tests, both of which I took simultaneously. However, I could see certain affinities in the design of those tests, as positive performance indicators. Because of how my train of thought was naturally oriented towards that particular style in the items, it was definitely a key intuitive advantage.
And given that a very large challenge concerns the correct interpretation of the question, if a testee is in tune with the actual test items, then the testee will not only clearly see the underlying question at hand, but also divergent patterns, whose indication can reveal logic traps or other conceptual detours to be avoided.
To illustrate, Dr. Jason Betts’ test battery (Asterix, Lux25, Mathema, and the World Intelligence Test) is one of the most accurate HR test batteries I’ve experienced. It has 97 questions among those four specific tests that hone in on measuring several cognitive dimensions. And overall, it seems almost impervious to guessing and luck by mere coincidence.
One of the unusual aspects of his particular type of test structure is the intentional crossroads built into very specific items to see if one can accurately discern the correct path among the competing mirrors. At the same time, this acts as a preventive measure against excessive time leverage, which may result in score inflation. And in a sense, preventing persons from seeing past what is termed their “TrueIQ”.
The JCTI by Dr. Xavier Jouve established one of the most unique presentations of alternative matrices and still remains a paramount reference in cognitive assessment design to this day. I might have been one of the last people to receive a certificate before Cerebrals Society operations halted.
What I found the most unusual about the JCTI was that it was the first computer-adapted IQ test to hone in on a testee’s IQ area early, from where the questions get more difficult in response to more challenging answers. Later on, I took his TRI-52 test, which I believe gave me a much more striking result.
Regarding anything remotely close in design, I’ve only found one specific test with a similar concept; the LDSE or Long Duration Spatial Examination, created by psychometrician Hans Sjoberg. Then again, it was based on the JCTI. However, it boasts an unusually high correlation to professionally proctored IQ tests, as evidenced by a Pearson R correlation of 0.95 with N = 20 reporting scores.
In bringing everything together, you’re starting to see a situation where initial testing opportunities may occur on social media to see if potential items or similar variants are likely to be stable later on, in the official release phase. This usually supports a better Chronbach’s Alpha. Small puzzles or items may be unexpectedly released to get a glimpse of actual vs anticipated answer patterns. Test creators will obviously make the adjustments to correct for unintended distortions in the answer expectancy range.
It’s normal for someone to posit “I created this particular puzzle, and here’s the opportunity to solve it. This numerical sequence, verbal analogy or spatial item has to be tested to get a feel for the expected answer pattern that supports a reasonable Chronbach’s Alpha.” It definitely applies more so to open-ended items. And a key benefit is that the test designer gets to learn from the response mechanisms that accrue in relation to each item, prior to the official test release.
So subsequent test releases can offer a better idea of what to expect over time. And preliminary norms tend to be better adjusted if arising out of initial item examination and subsequent beta testing. Based on mere observation, wide swings moving from the preliminary to the first norm tend to be accompanied by test reliability challenges, looming not too far behind. In other words, even though the preliminary norm is not as important as the first official and beyond, paying very close attention to precise estimates and assumptions in scoring differentiation and test progression scale can improve the transition to the first norm.
And quite a bit of feedback can be gained, even from small samples. In comparison, standardized tests or proctored IQ tests have a massive accrual of IQ test data from possibly hundreds of thousands of people over so many decades of highly controlled administrations.
The only way that the HR testing space could possibly match such incumbent advantages by direct correlation is through consistent increases in score pair reporting numbers and accuracy from gold-standard tests such as the WAIS, SB, Ravens 2, or the Cattell. One of the most overlooked statistical factors is the sample size associated with the Pearson-R value. If one can move in the direction of N = 30 or greater, in terms of official score reporting by testees, then test correlation with the general populace also becomes better supported by the fact that the number of subsequent norm adjustments is precisely minimized.
At present, there is something else occurring that’s a bit more divergent, but supports the evolution of high range examination long-term. Where proctored exams and academia are positioned on one side and HR or online testing uniquely on the other, there is something called the International Cognitive Ability Research or ICAR, which was developed between the University of Cambridge in England and Northwestern University in Chicago, Illinois.
This particular project by mere virtue of its application demonstrated that online testing has several distinct advantages over more traditional modes. One of which is that it allows for very fast data accrual and scale in real-time (i.e., 97,000+ participants in the overall project), with the only limiting factors outside competent application development being bandwidth and network reliability. Two specific ICAR tests have been able to successfully overcome these in their rapid and vast market reach.
One test is the ICAR16 and another is the ICAR60, both of which dealt with visual-spatial rotations, simplified spatial and object orientations (i.e., odd one out anti-patterns or visual sequence fit of presented objects).
One of the original ICAR creators is a psychometrician who had several active tests on IQExams.net termed the Cambridge tests, which were noted to have very high correlations with the WAIS. Although there may have been disagreement about the repetitive nature of the items presented, this serves as the spark where professionally licensed psychologists or psychometricians on an individual basis are willing to work with test developers, statisticians, and technologists in the HR testing space.
Emergent platforms like IQExams.net are very transparent with test statistics. Other incumbent developers may also have specialized knowledge as mathematicians, statisticians, and psychometricians. And a handful usually accrues data analysis archives with some of the best HR test stats available.
Newcomers can better exploit the value of getting near-instantaneous feedback on targeted test items. Although a small advantage, it does serve somewhat as a compensatory measure against the previous time accrual constraints experienced by more experienced developers, especially those who started pre-internet. Now, the feedback loop can be better incorporated into more reliable test designs.
Time allotment in HR testing is another challenging aspect of cognitive measurement design, and can even impact test reliability in a way that poses arguably more risk than item subjectivity. Working memory capacity and processing speed are two of several cognitive aspects typically associated with this dimension. But at its most precise application, it can control for certain other elements in strategic thinking about time tradeoffs, during an actual test.
For example, I can increase the difficulty level of the next several items, by strategically placing what appears to be a time-consuming item just ahead of them. What the testee needs to realize is that what I’m really testing for is the ability to identify the shortcut in the current item that affords more time on the others. Therefore, seeking to set the time within a more precise range of cognitive pressure can give insight into how IQ relates to strategic time allocation.
In psychometric terms, the standard at the university level (i.e., if you look at the SAT, GRE, and GMAT), is about 2 minutes per test question item. It doesn’t necessarily have to be that exactly. In other cases, you have a few HRTs that are close to a minute per item, although most timed HRTs in comparison are substantially longer than 2 minutes per item for testing difficulty within the higher IQ ranges.
Footnotes
[1] Member, World Genius Directory.
[2] Individual Publication Date: April 15, 2022: http://www.in-sightpublishing.com/isom-2; Full Issue Publication Date: May 1, 2022: https://in-sightpublishing.com/insight-issues/.
*High range testing (HRT) should be taken with honest skepticism grounded in the limited empirical development of the field at present, even in spite of honest and sincere efforts. If a higher general intelligence score, then the greater the variability in, and margin of error in, the general intelligence scores because of the greater rarity in the population.
License
In-Sight Publishing by Scott Douglas Jacobsen is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Based on a work at www.in-sightpublishing.com.
Copyright
© Scott Douglas Jacobsen and In-Sight Publishing 2012-Present. Unauthorized use and/or duplication of this material without express and written permission from this site’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Scott Douglas Jacobsen and In-Sight Publishing with appropriate and specific direction to the original content. All interviewees and authors co-copyright their material and may disseminate for their independent purposes.