Matthew Scillitani on High-Range Psychometrics, Validity, and Test Security
Author(s): Scott Douglas Jacobsen
Publication (Outlet/Website): Vocal.Media
Publication Date (yyyy/mm/dd): 2026/04

Matthew Scillitani is a psychometrics practitioner at Neurolus Psychometrics focused on developing supervised, time-limited high-range ability examinations. He co-launched The Mental Inventor with Paul Cooijmans as an empirical testbed for a central measurement question: whether performances can be validly differentiated in the extreme right tail under proctored conditions. His approach emphasizes procedural integrity—identity verification, approved proctoring, and rule enforcement—alongside cautious claims about interpretation until reliability and validity evidence is established. He highlights emerging threats to unsupervised testing, including AI-assisted responding and large-scale collaboration, and advocates peer review before formal reclassification.
Scott Douglas Jacobsen interviews Matthew Scillitani on the psychometric ambitions and safeguards behind supervised, time-limited high-range testing at Neurolus Psychometrics and The Mental Inventor. Scillitani explains that exploratory validity work may begin at 50 submissions, with stronger analyses at 100 or 250, using prior candidate data to reduce sample requirements. He stresses moderate cross-section correlations as evidence of broad reasoning, transparent reporting of selection bias, and strict standards for excluding compromised sittings. The discussion also addresses score uncertainty, interpretive restraint, third-party misuse, and evolving security threats, including answer leakage, collaboration, and AI-era integrity concerns.
Scott Douglas Jacobsen: What empirical threshold would move from exploratory data collection to a formal validation study?
Matthew Scillitani: We plan to start exploring construct validity at 50 submissions, with follow-up analyses at 100 and, if needed, 250 submissions.
Intuitively, these sound like too small samples, but understand that we are not generating norms from nothing. In many cases, we already know candidates’ prior scores on related exams. This allows us to use methods like rank equation, reducing sample size requirements compared to traditional norming methods that use an unselected population.
That said, 50 submissions may only support an exploratory analysis. The initial goal is to observe trends and possible construct measurement. If results are inconclusive, we would continue collecting data and re-evaluate at 100 and again at 250 submissions.
Jacobsen: The exam includes verbal, numerical, and spatial items. What evidence would reflect broad reasoning ability rather than an unusually strong specialty cognitive profile?
Scillitani: The appropriate approach here is to examine the relationship between the three sections. If they correlate very highly, it suggests they measure more or less the same thing. Ideally, there is a moderate positive intercorrelation across sections, such as 0.4 to 0.6.
Jacobsen: Eligibility is limited to English-speaking adults who can arrange an approved proctor. How will you estimate the selection bias built into it?
Scillitani: This exam necessarily produces a selective sample because it requires candidates to be English-speaking adults, find a proctor, and have the willingness to sit for a challenging exam.
We intend to document this clearly and publish aggregate (anonymized) candidate characteristics in the statistical reports. This includes country, age, sex, and other relevant demographics, making the sample in question clear to both researchers and candidates.
Jacobsen: If results begin to differ systematically by proctor type or testing environment, what would count as enough distortion to justify excluding sittings?
Scillitani: The decision to exclude data is serious because post hoc removal of inconvenient results can permanently damage the integrity of our research. It is best to exclude data only when there is clear, well-documented evidence that the sitting was objectively compromised by cheating or improper testing procedures.
That evidence does not necessarily need to be a confession, but may also be evident in the statistics. For example, anomalous response patterns such as impossibly similar responses in two submissions from the same town, or a documented mishap such as a candidate needing to exit the exam early.
We will internally document any exclusions so that peer reviewers can judge for themselves whether those exclusions were justified.
Jacobsen: Retesting is not permitted. How do you plan to estimate the uncertainty around an individual high-end score?
Scillitani: Uncertainty will be estimated psychometrically via reliability and the standard error of measurement.
Scores, outliers or not, should always be understood in the context of their margin of uncertainty, which candidates and organizations will know when the first statistical report is published.
Jacobsen: You provide scaled scores. How do you prevent the scale from encouraging stronger conclusions than intended?
Scillitani: This is both a technical and ethical issue. At this early stage, scaled scores are used because the exam is not yet standardized, and we do not want the terminology to imply greater normative maturity than is warranted.
We also do not present these scores as measuring I.Q. or any other construct, both in score reports and on the website. The score conversion table exists only to provide candidates and organizations with a point of reference, not to make any claims the data cannot yet support.
Jacobsen: If outside organizations use the exam, where will responsibility begin and end in preventing overclaiming or misuse of results?
Scillitani: Third parties are responsible for the claims they make. However, that does not mean that publishers have no responsibility at all. We provide clear documentation, interpretive limits, and as much statistical information as possible so that nobody is misled.
Jacobsen: As AI systems and answer-sharing methods improve, how will you update the exam to preserve security?
Scillitani: AI is not yet a major concern because the exam procedure disallows electronics, making it inaccessible during the exam. But there are more immediate security concerns, such as answer leakage.
Unusually similar response patterns or geographically clustered irregularities will be flagged for review. And if a specific location or proctoring option shows signs of compromise, we will investigate and resolve the issue in the fairest way possible.
Over time, this may require tactical countermeasures, but I cannot say publicly what they would be. Measures may already be in place to identify compromised sittings so that legal action can be taken against the culprit.
Jacobsen: Thank you very much for the opportunity and your time, Matthew.
Last updated May 3, 2025. These terms govern all In Sight Publishing content—past, present, and future—and supersede any prior notices. In Sight Publishing by Scott Douglas Jacobsen is licensed under a Creative Commons BY‑NC‑ND 4.0; © In Sight Publishing by Scott Douglas Jacobsen 2012–Present. All trademarks, performances, databases & branding are owned by their rights holders; no use without permission. Unauthorized copying, modification, framing or public communication is prohibited. External links are not endorsed. Cookies & tracking require consent, and data processing complies with PIPEDA & GDPR; no data from children < 13 (COPPA). Content meets WCAG 2.1 AA under the Accessible Canada Act & is preserved in open archival formats with backups. Excerpts & links require full credit & hyperlink; limited quoting under fair-dealing & fair-use. All content is informational; no liability for errors or omissions: Feedback welcome, and verified errors corrected promptly. For permissions or DMCA notices, email: scott.jacobsen2025@gmail.com. Site use is governed by BC laws; content is “as‑is,” liability limited, users indemnify us; moral, performers’ & database sui generis rights reserved.
