Skip to content

On High-Range Test Construction 7: Daniel Shea, M.Sc., the Adaptive IQ Test

2024-08-15

 

 

 

 

 

 

 

 

 

Publisher: In-Sight Publishing

Publisher Founding: March 1, 2014

Web Domain: http://www.in-sightpublishing.com

Location: Fort Langley, Township of Langley, British Columbia, Canada

Journal: In-Sight: Independent Interview-Based Journal

Journal Founding: August 2, 2012

Frequency: Three (3) Times Per Year

Review Status: Non-Peer-Reviewed

Access: Electronic/Digital & Open Access

Fees: None (Free)

Volume Numbering: 12

Issue Numbering: 3

Section: E

Theme Type: Idea

Theme Premise: “Outliers and Outsiders”

Theme Part: 31

Formal Sub-Theme: High-Range Test Construction

Individual Publication Date: August 15, 2024

Issue Publication Date: September 1, 2024

Author(s): Scott Douglas Jacobsen

Word Count: 2,953

Image Credits: Daniel Shea.

International Standard Serial Number (ISSN): 2369-6885

Abstract

Daniel Shea, M.Sc. is the founder and CEO of Chatoyance. Shea possesses a Master’s degree in Computer Science from the University of New Hampshire, with several years of industry experience in software engineering. He has published freelance articles on foreign exchange market strategy analysis and has published software analyzing fractals in the foreign exchange markets. Leveraging his experience with software design and financial markets, he started Chatoyance with the intent of transforming the way independent investors approach the foreign exchange market. Shea discusses: interest in test construction; the earlier tests and Chris Cole and Dean Inada; the origin and inspiration; Cole and Inada; training in general statistics and software engineering; skills and considerations; help with problem schemas, adaptivity, user interfaces, and renorming; verbal problems and replicability across other problem types; roadblocks test-takers tend to make in terms of thought processes and assumptions around time commitments; the most appropriate means by which to norm and re-norm a test; the Adaptive IQ Test website; tests and test constructors; and the making of a test.

Keywords: adaptive generative test challenges, adaptive IQ Test, challenges in test-taking assumptions, Chris Cole, Daniel Shea, Dean Inada, Adaptive IQ Test development, Dynamic test development, Glen Wooten, high-range IQ societies, item curve adaptation, John Fahy, Mega and Titan Test item analysis, multidimensional high-range tests, Nathan Hays, norming and renorming high-range tests, problem schemas and adaptivity, Rick Rosner, test security and leakage, verbal problems in high-range tests, Werner Couwenbergh.

On High-Range Test Construction 7: Daniel Shea, M.Sc., the Adaptive IQ Test

Scott Douglas Jacobsen: When did this interest in test construction truly come forward for you? 

Daniel Shea: My involvement came about from conversations with Chris Cole and Dean Inada. There had been an effort to implement an adaptive, generative test many years ago, but it reached a point where conceiving of new high-range questions became increasingly difficult and there were some technical challenges in actually coding a platform to take such a test. Since I had some background on the technical side, I offered to assist.

Jacobsen: What were the general realizations about the earlier tests, e.g., The Mega Test, The Titan Test, The Ultra Test, and The Hoeflin Power Test, of Ronald Hoeflin (Mega Society), and then the need to work in coordination with others for you, i.e., Chris Cole and Dean Inada, to develop a more dynamic test? This form of test development began before you.

Shea: These tests, and other high-range tests available today, are untimed and unsupervised, which introduces many self-evident problems, chief among them being that people will leak answers or collaborate with others. Some of these issues may have been less prevalent at the time these tests were originally constructed in the 1980s and 1990s, but for several years now, many of the answers to these tests have been made available on various message boards or Usenet groups. In some instances, the answers are incorrect or there are multiple answers floating around which muddy the waters, but this is not always the case.

A test should not be entirely discarded just because one or two answers have been leaked. On the other hand, if enough answers have been leaked that one could achieve a sufficiently close score to a given society’s cutoff, that society may need to take a vote on whether to continue to allow the test to be used for admission. There is an ongoing effort to identify tests that have been compromised to such a degree, but that judgment call is not an exact science.

Much of the background on the motivation for a dynamic test has been covered in Chris Cole’s September 2001 article “How to Protect High-Range Tests” in Noesis #155. To quote, “In looking at many tests, there is a certain pattern that appears. It is possible to classify the problems into groups. For example, Ron Hoeflin has a group of problems about cells formed by intersecting various solids such as spheres, cubes, etc. The solution to one member of this group (say, three cubes) does not help much in the solution of another (say, two cones and a sphere). Yet it might be the case that there is an underlying mathematics that yields the answers to all of the problems in the group. Then a very large number of problems could be generated, where the solution to one problem would not help in the solution of another. This would be ideal for creating an on-line test, because cheating would be impossible.” I would probably caution that this does not make cheating outright impossible, but introduces another layer of security.

Jacobsen: Similarly, what was the origin and inspiration for joining this small team – the facts and the feelings?

Shea: In a way, the fact that the team was so small made it easier to join. There was a website, mental-testing.com, that had an initial version of the adaptive test, but it was not working at the time that I joined, so the decision was made to rewrite it from the ground up. With greenfield projects in general, there are more degrees of freedom and less rigidity in its development. The ability to make some sort of impact, even if only on a technical level, was appealing. There is also the fact that the Ultra Test and the Power Test, which are the only tests used for Mega Society admission at this point in time, will eventually be spoiled in their entirety, at which point there will be no viable test for admission without some suitable replacement.

Jacobsen: As an open credit to Cole and Inada, what have been each of their major contributions to the development of the Adaptive IQ Test (2003-present)? (Anyone else, too?) For examples, “How to Protect High-Range Tests” by Chris Cole comments on the difficulties in test questions/high-range tests remaining non-compromised in the internet era, the cost in open-sourcing test creation and norming, and the possibility in designing high-range tests with more foundational principles of math to generate questions (through schemas). Subsequently, “Reply to Chris Cole on Norming High-Range Tests” by Dean Inada commented on something like probability sloping for relative hardness of problems per person and problem. They were discussing, in essence, some foundations for–what would become–the Adaptive IQ Test

Shea: The background discussed in those articles serves as the foundation for what the Adaptive IQ Test has become in its current iteration. Dean Inada, in his response article, writes “we’ll want a better method of norming the tests than simply ranking people by the number of questions they get correctly, since one person may be asked harder questions than another. I suggest a method that tries to estimate for each question the probability of getting it right or wrong as a function of a person’s percentile rank in the population, this rank is estimated by multiplying the generally increasing and decreasing functions for the problems gotten right and wrong.” The Adaptive IQ Test implements this, modeling an individual curve for the test-taker based on their responses to each administered item and its item curve, and presenting a problem variant accordingly.

Jacobsen: You do not have a formal background in psychometrics. Most people in the high-range construction space do not have a formal background in psychometrics. However, how have training in general statistics and software engineering, i.e., stuff used at Chatoyance, helped with the work on the Adaptive IQ Test?

Shea: As noted, I do not have a formal background in psychometrics. My involvement in the project has been largely technical in nature, drawing on prior general software engineering skills to implement the problem schemas and adaptive component, design the user interfaces for each problem (some may require drawings, some may require filling in a grid, etc.), automate the norming and curving for each item as results come in, and so on. Indeed, the largest challenge has been in conceiving of suitable problem schemas, which I am happy to brainstorm but of course defer to those with a deeper background than my own. Between that and ensuring problem variants are all similarly challenging, progress is ongoing.

Jacobsen: What skills and considerations, in an overview, seem important for both the construction of test questions and making an effective schema for them?

Shea: Among the questions that exist in the current alpha version of the test, these were largely derived from existing problems authored by Ron Hoeflin. The sense was that it was not the problems themselves that were fundamentally at fault here, but rather that it took more effort to vet a sufficient problem than it did for someone to go on to leak it.

With that said, deriving a schema that generates problems of similar difficulty is a challenge, and often requires restricting the degrees of freedom for the generator itself. For instance, the Mega and Titan item analysis has shown that the interpenetrating solid questions tend to be among the most challenging, but the degree to which they are challenging varies significantly. Consider the three interpenetrating solid questions on Ron Hoeflin’s Power Test, which are lifted from the Mega and Titan Tests. There is a notable difference in the difficulty of the interpenetrating cube and tetrahedron compared to the interpenetrating three cubes compared to the interpenetrating two cones and one cylinder. It would not be good practice to include a general schema for any configuration of interpenetrating solids. Rather, you would need to classify these by difficulty and generate them separately. But where does this classification come from? Item analysis gets you started, but at a certain point, you also depend on a sufficient number of people to take the test and get a better idea of the difficulty and signal of each variant.

Jacobsen: How do you help with problem schemas, adaptivity, user interfaces, and renorming? How are the problem schemas developed from the Mega, Titan, and Ultra, tests, e.g., the six sides question from the Ultra Test (problem 45) and grid sequences from the Power Test (problems 32-36)?

Shea: In some ways, it is difficult to discuss particular schemas at length because doing so may reveal the underlying pattern in the process. Many schemas are derived programmatically, while some do not have a proven underlying pattern but are bucketed in the same schema, such as the interpenetrating solid variants discussed prior.

User interfaces are designed according to the requirements of the problem. The most challenging interfaces have been the sixth side problem, which requires drawing on a canvas and scoring the answer in a way that accommodates any orientation of the object, and the three dice problem, whose challenge was less with the user interface per se and more with the backend construction of each variant.

Norming is automatically done after each test has been completed. This also backfills prior test-takers, whose estimates are updated accordingly. In the interest of fairness, there are two metrics presented: the immutable estimate per the norm at the time of the test’s completion and the most recent estimate per the latest norm.

Jacobsen: How are verbal problems capable of presenting appropriately challenging problems with variation in type while sustaining similarity of difficulty? Is this replicable across other problem types, e.g., spatial, numerical/quantitative, matrices, etc.?

Shea: Verbal problems in particular have been quite tricky. In the current form of the test, there are trial questions which are presented to the test-taker but do not impact their estimated curve. These trial questions include some, but not all, of the verbal questions. This is in part because verbal problems that have a clean generalization tend to be quite easy to solve. Unlike problems with a more mathematical or logical approach, verbal problems tend to be self-contained, and if generalizable at a high-range, risk producing variants that are far more esoteric than others. This class of problems continues to present the greatest challenge.

Jacobsen: Potentially, what are roadblocks test-takers tend to make in terms of thought processes and assumptions around time commitments on these high-range tests? So, they get artificially low scores. 

Shea: In terms of time commitments, at this point, there is no limitation to the length of time that a test may be completed. Historically, it would have been more difficult to enforce, as most high-range tests are made available in their entirety to the public. There are some approaches that are taken to minimize leakage of the questions themselves, such as with Paul Cooijmans requiring test-takers to directly request a copy of the test, though my understanding is that this is done to prevent public discussion of the questions and, in turn, their answers, as opposed to any limitations on time taken to complete the test. Timed tests do allow for a measurement of processing speed to some degree, as well as a standardization of test-taking conditions, but given that these particular tests are already being administered without supervision and in whichever environment the test-taker prefers due to the questions requiring a significant amount of time to answer, timing the test could risk giving an unfair advantage to those who simply have more free time to commit.

As far as thought processes, I do not have enough insight into individual test-takers to make broad generalizations about their personal approaches to these problems. From what I have witnessed myself through discussions with others, there is, perhaps unsurprisingly, a tendency to overthink a question or use complicated reasoning to justify a suspected answer, thereby getting it wrong. Almost every time, the answer is clean; like learning how a magic trick is performed, the question once looked impossible but suddenly seems deceptively simple.

Jacobsen: What are the most appropriate means by which to norm and re-norm a test when, in the high-range environment so far, the sample sizes tend to be low and self-selected, so attracting a limited supply and a tendency in a type of personality?

Shea: Since norms are performed on test completion, the process has little overhead. To accommodate low sample sizes, an initial item curve is provided for questions when known. For example, if a schema is adopted from a prior test such as the Ultra Test, then the item curve for that problem is used as the seed for this test. In some cases, such as novel schemas which do not have a prior item curve from which to draw, the curve starts out flat and is gradually shaped based on the test-taker’s answers to other questions.

With these sorts of tests, the low sample size continues to be a problem, but part of this high barrier to entry may be the historical nature of how these tests were administered, between accessibility and cost to score. By making the test available online and without charge, the hope is that this may motivate others to try it out.

As far as the types of personalities that are drawn to high-range tests, I defer to Grady Towers’ observations in Noesis #141 regarding the types of personalities that exist across different societies and the corresponding tests used for their admission. Perhaps there is something to be said for stressing both verbal and non-verbal aptitude.

Jacobsen: The Adaptive IQ Test website opens with a series of claims:

This is an online IQ test that contains several innovative features. Here are some reasons to take this test.

  1. As you answer more questions, the estimate of your rank in the population becomes more accurate.
  2. You see a graph of your estimated rank, not just a single number.
  3. You are allowed to skip questions and come back to them.
  4. You are automatically asked questions that will help make your estimated rank more accurate.
  5. As more people take the test, the graphs become more accurate.
  6. There are a number of anti-cheating devices being used.
  7. The results of this test may be used for acceptance into various high IQ societies.

Any points of clarification that have been needed on any of these at any time in the past from prospective/actual test-takers or the curious? They can be stated here. 

Shea: Some of these points are better characterized as statements of fact about the functionality of the test itself, such as the ability to skip questions. One point to clarify about items 1 and 5 is that the estimate for a completed test may change over time as the test is repeatedly normed. There are plenty of cases across other IQ tests where an individual completes the test and receives an estimate only for subsequent test-takers to receive a lower estimate with the same raw score due to the ceiling being lowered through norms over time, and vice versa. As the adaptive test is normed here, all estimates are updated in unison, preventing this discrepancy between raw scores and percentile estimates across different test-takers. As mentioned earlier, both the estimate at the time of the test’s completion and the most up-to-date estimate are presented for completeness.

Jacobsen: What tests and test constructors have you considered good?

Shea: The gold standard for high-range testing has always been Ron Hoeflin’s series of tests. These serve as the foundation for much of the existing questions in the current early version of the Adaptive IQ Test. Beyond him, there are many test constructors who have quite novel test items that could be of inspiration.

There is value in multidimensional tests that select for both high-range spatial and verbal problems. I again cite Grady Towers, who wrote of this back in 1998 over the course of several letters published in Noesis #141, where he reflected on the implications for high IQ societies that admit members on the basis of tests that stress both verbal and spatial skills as opposed to one or the other.

Jacobsen: What have you learned from helping in the making of a test?

Shea: It is important to not let “perfect” be the enemy of “good.” There will always be shortcomings with any approach. Care needs to be taken to minimize these shortcomings and accommodate them to the extent possible.

Perhaps a second learning is that there is a high-range test vacuum of sorts, and that vacuum is being filled with any number of experimental high-range tests. This is not necessarily an issue in itself, as many of these test items are intriguing and derived from historical best practices, including the very test being discussed here. More to the point, ideally, those with a formal background in psychometrics would be more involved. I am happy to help where I can, but I also recognize my own limits in this space.

Jacobsen: Thank you for the opportunity and your time, Daniel.

Shea: Thank you for giving me the chance to highlight this project! I feel the need to stress that it is very much in an alpha state and that development is ongoing, but that progress is being made. Special thanks go to Chris Cole and Dean Inada for the decades of work that they put into this long before I arrived, Werner Couwenbergh for his hard work on the interpenetrating solid variants, those who provided input thus far (John Fahy, Nathan Hays, Rick Rosner, and Glen Wooten, among others), and everyone who has provided feedback. I am but a vessel, helping to bring this to fruition where possible.

Bibliography

None

Footnotes

None

Citations

American Medical Association (AMA 11th Edition): Jacobsen S. On High-Range Test Construction 7: Daniel Shea, M.Sc., the Adaptive IQ Test. August 2024; 12(3). http://www.in-sightpublishing.com/high-range-7

American Psychological Association (APA 7th Edition): Jacobsen, S. (2024, August 15). On High-Range Test Construction 7: Daniel Shea, M.Sc., the Adaptive IQ Test. In-Sight Publishing. 12(3).

Brazilian National Standards (ABNT): JACOBSEN, S. On High-Range Test Construction 7: Daniel Shea, M.Sc., the Adaptive IQ Test. In-Sight: Independent Interview-Based Journal, Fort Langley, v. 12, n. 3, 2024.

Chicago/Turabian, Author-Date (17th Edition): Jacobsen, Scott. 2024. “On High-Range Test Construction 7: Daniel Shea, M.Sc., the Adaptive IQ Test.In-Sight: Independent Interview-Based Journal 12, no. 3 (Summer). http://www.in-sightpublishing.com/high-range-7.

Chicago/Turabian, Notes & Bibliography (17th Edition): Jacobsen, S “On High-Range Test Construction 7: Daniel Shea, M.Sc., the Adaptive IQ Test.In-Sight: Independent Interview-Based Journal 12, no. 3 (August 2024).http://www.in-sightpublishing.com/high-range-7.

Harvard: Jacobsen, S. (2024) ‘On High-Range Test Construction 7: Daniel Shea, M.Sc., the Adaptive IQ Test’, In-Sight: Independent Interview-Based Journal, 12(3). <http://www.in-sightpublishing.com/high-range-7>.

Harvard (Australian): Jacobsen, S 2024, ‘On High-Range Test Construction 7: Daniel Shea, M.Sc., the Adaptive IQ Test’, In-Sight: Independent Interview-Based Journal, vol. 12, no. 3, <http://www.in-sightpublishing.com/high-range-7>.

Modern Language Association (MLA, 9th Edition): Jacobsen, Scott. “On High-Range Test Construction 7: Daniel Shea, M.Sc., the Adaptive IQ Test.” In-Sight: Independent Interview-Based Journal, vo.12, no. 3, 2024, http://www.in-sightpublishing.com/high-range-7.

Vancouver/ICMJE: Scott J. On High-Range Test Construction 7: Daniel Shea, M.Sc., the Adaptive IQ Test [Internet]. 2024 Aug; 12(3). Available from: http://www.in-sightpublishing.com/high-range-7.

License & Copyright

In-Sight Publishing by Scott Douglas Jacobsen is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. ©Scott Douglas Jacobsen and In-Sight Publishing 2012-Present. Unauthorized use or duplication of material without express permission from Scott Douglas Jacobsen strictly prohibited, excerpts and links must use full credit to Scott Douglas Jacobsen and In-Sight Publishing with direction to the original content.

Leave a Comment

Leave a comment