Skip to content

Ask A Genius 869: Hey, guess what? More on IQ Tests!

2024-03-31

Author(s): Rick Rosner and Scott Douglas Jacobsen

Publication (Outlet/Website): Ask A Genius

Publication Date (yyyy/mm/dd): 2024/01/07

[Recording Start] 

Rick Rosner: All right, so, talking about high IQ tests, IQ tested 120 years ago, or when they were first conceived of by Binet, they were supposed to be on a scale of one to five given to kids to see what kind of educational resources they might need. So, score a one, you’re dumb, and you need educational resources, and you score a three, you’re average. Just be flopped into a classroom, and if you score a five, you’re smart, and you need different educational resources. Then Termin at Stanford, I believe, and I might have all this wrong, decided to put it on a 100-point scale where 100 is average, and I believe he came up with the ratio score, which is if you’re four years old but you score as well as the average eight-year-old on an IQ test you get eight divided by 4 = 2 times 100 gives you an IQ of 200. If you’re four, you score like a six-year-old; 6 divided by 4 is 1.5 times 100, which gives you an IQ of 150, which also gives IQ scores a false precision since their two and three digits seem to be very precise, which is just not the case.

A different means of scoring the tests, a semi-different one, was developed for adults, which is the population rarity, which is if you score better than all, about one out of 750 adults, that gives you a rarity of three standard deviations and we’re going to set a standard deviation as being worth 15 or 16 points on a 100-point scale. So, scoring that high gives you an IQ of 148. So, if you score higher than all but one person in three million, that’s five standard deviations. Standard deviation is a measure of the width of a bell curve, a standard curve of like height or running speed or anything that’s called normally distributed where there’s an average and people fall on either side of the average, with most people falling pretty close to the middle. So, that leads to questions for kids: “Are you a five-year-old as smart as a seven-year-old or as smart as an average three-year-old?” If it’s a three-year-old, that’s 3 over five, it gives you an IQ of 60. For adults, it’s a rarity within the population.

So, the childhood IQ score gives you an idea of how smart somebody is because you’re comparing people to people, you’re comparing a person being tested who might be five or eight or whatever, to kids of different ages and saying, well, this person is as smart as an average third grader or fifth grader who is an understandable and fairly concrete indication what a kid’s intelligence is. Again, it’s based on other people; other people’s abilities. With the adult scale, which is a rarity in the population, you’re comparing the IQ to other people. It’s different and, in a way, kind of less concrete and more abstract because you know what a fifth grader can do. You take a classroom of fifth graders, and you see what the average kid can do in terms of spelling and math, what kind of words they know and how well they can read; that’s reasonably concrete. Then you take an adult IQ, and you just say this person’s smarter than two people out of three, and this other person’s smarter than nine people out of 10, and that’s not as grounded a measure.

Advertisement

Then, you start talking about people with IQs above 150, where most people take IQ tests as kids to see where they should be placed or if they need extra educational resources. Few people take IQ tests as adults because there’s no need. Similarly, there’s no need to measure people’s IQs above 150, and that’s where most IQ tests stop because if somebody can score 150, you know they’re really smart. What does it matter? If they’re that smart, they can go and find educational resources themselves as an adult. Adults who talk about their IQs are weirdos, and Stephen Hawking has called them losers. People demonstrate their intelligence as adults by succeeding or not in the world. So, anything above 150 is itself a little absurd, but it has become a sport rather than any kind of diagnostic tool. 

If you have a kid and that kid is scoring a 200, a four-year-old scoring like an eight-year-old, that is a fairly exceptional situation, and it might be worthwhile knowing that, apparently, that kid has an IQ of 200 versus another kid who’s got an IQ of 140. So, yeah, the family is going to deal with that, but when you get into these adult tests that try to measure IQs over 150, it’s a sport. It’s like the world’s strongest man. It’s just a thing that’s fun-ish or semi-interesting, but you don’t need a guy who can pick up a rock that’s two and a half feet in diameter, a big circular stone or a guy who can pull a truck with his teeth. It’s cool, and you can make a TV show out of it, but it’s a sport that doesn’t have much value outside of being a sport. It is similar to people taking IQ tests and trying to get a 180, but you could also ask if an IQ 180 means anything. There’s the idea of general intelligence that somebody who’s smart will be smarter at any kind of puzzle than somebody who’s less smart, but you could ask the question, “Can you figure out if somebody’s got a 180 IQ versus a 170 IQ and if you took somebody with a 180 IQ, would they be generally smarter on hard puzzles than somebody with a 170 IQ or does the idea of general intelligence not apply the higher you go?”

The whole thing gets kind of nebulous, but it makes sense that it would. It makes sense that in the future when we get artificial general intelligence (AGI), there may be artificial intelligence that is generally smarter and could have IQ equivalents, so an AGI might be smarter than all but one person in two million. On the other hand, what people are afraid of is that AGI will just keep getting smarter and smarter. An AGI that has an IQ of 160 today might have an IQ of 185 three months from now. Another one is whether there are problems that we don’t know if puzzles go up beyond a certain IQ because when you look at a lot of IQ items that are supposed to be super hard, they’re made hard by just stacking a bunch of sub-items together in a chain. The difficulty is working your way through the chain, and those problems kind of suck. 

There are all sorts of problems with measuring ultra-high IQs, but the way you do it is kind of straightforward: when you write an IQ test, you create one. If you’re Ron Hoeflin, you write a bunch of IQ problems, and you’ve got a pool of people who like taking these tests and are good at them, and you go through several iterations of the test where you write a hundred problems, and you give those problems to people in say sets of 20, and you see how smart people do on the problems and if like 20 out of 20 are getting or a 100 out of a 100 that you’ve given this one problem too, everybody gets it right, you throw out that problem because it’s no good at distinguishing among smart people; it’s too easy. Similarly, if zero out of 100 get a problem right, then you throw that out because it’s too hard, it doesn’t distinguish among levels of intelligence, and you get feedback from your test takers, and people say this problem doesn’t have as well defined an answer is your other problems, or there are two possible answers or we really sure that this number that we’re supposed to come up with is proven to be the answer to this problem, etc. Anyway, you go through, and you do quality control, or Ron did quality control until for the Mega test and then the Titan test, and then several later tests; he had 48 really solid items. Then you look at everybody’s raw score, which is from 0 to 48, and then you go to the people or the people when they submit their answers, they also submit their scores on other IQ tests or other tests such as the SAT or the GRE or the LSAT that can be converted into IQ scores.

Advertisement

So, the SAT, when it was first set up, was set up to be scored like an IQ test with a mean of 100 and a standard deviation of 15, 16 or 24, depending on which test you’re looking at. The SAT was set up to have a mean of 500 and a standard deviation of 100. So, a score of 800 on a section of the SAT equals three standard deviations equals an IQ of 148. Now, the SAT, because it’s a fairly big business because millions of people take it every year, would get reformed. Every year, they would compare people’s scores on various items so the mean did not stay at 500 from year to year and decade to decade, and the standard deviation would change every year. The SAT, over time, had difficulty in convincing a lot of people that it was really necessary. So, the SAT would periodically reform and reset the test to show that it was this statistically legitimate academically helpful thing, that it was a good part of a kids’ college application packet that it would tell people who were deciding which kids to let into a school. A high SAT score was supposed to say this person has a good chance of doing well at your school. Over time, people found that the SAT really didn’t help or add anything to a kid’s application package. Knowing a kid’s SAT did not help you determine whether this kid was going to be successful at your college, and then COVID killed it because it was hard to administer when everybody was isolated. So, most US colleges and universities now don’t require it. 

Anyway, to get back to norming, and I’m talking a lot, but somebody submits their answers to the Mega test to Ron, and then they also submit three scores they’ve gotten on other tests, say the SAT or the Stanford Binet when they were a kid. And say this person gets a 23 on the Mega, and they self-report; you could be bullshitting, but most people probably aren’t. They report that they got a 142 on the Stanford Binet and they got a 720 on the SAT verbal, and a 750 on SAT math, and that becomes a data point or several data points for Ron where the person who got a 23 reported IQ scores or IQ equivalent scores of 142 and then 130, he looks up a 720 on SAT verbal in 1981 equals in terms of IQ or in terms of rarity and he does the conversion. So, this person, according to the self-reported scores, has an IQ of 141, and then another 4,000 people take the Mega test. Among them, they report 10,000 different scores on IQ tests, and Ron plugs all this in. He expects that somebody who gets 43 questions right on the Mega test, which just a few dozen people did, is going to report super high IQ scores, and he plugs in everything, and he comes up with the IQ that he thinks each number of correct answers on the Mega test corresponds to and more people took the Mega test than any other ultra-high IQ test ever.

So, his norming of the Mega test should be the most convincing and maybe accurate of any high IQ test ever and according to the self-reported scores and Ron’s calculations, a perfect score on the mega test I think corresponded to a score of 190 plus IQ score standard deviation 16, is that correct?

Scott Douglas Jacobsen: I think so.

Rosner: All right. So, people in this small community were convinced that this was a legitimate thing and that it seemed reasonable. You’re assuming people are telling the truth about their other IQ scores, and you’re assuming that people aren’t cheating on the Mega test, though early on, it was fairly hard to cheat, and then later, it became super easy to cheat. The Mega test came out in 1985 in Omni magazine, which is roughly ten years before the internet, but then once the internet came along, people were able to contaminate all the… So, it was hard to cheat on the Mega test in the 80s. In the 90s and beyond, it was easy to cheat on the Mega test because you could look up the answers that people had shared on the internet. Also, Google made it easy to search for answers to verbal problems, but early on, cheating wasn’t so much of a problem on Mega. More recently, somebody has reformed the Mega test, and you can talk about that because I don’t know how that works.

Jacobsen: The short of the long is David Redvaldsen published as far as I can tell a preview paper with a statistical analysis of the Mega test and the Titan test with reference to how high they can measure. It appears to be the first real mainstream academic presentation of the high range testing world.

Rosner: So, who is this guy, and where was he published?

Jacobsen: In the journal Psych, his name is David Redvaldsen. The published paper was from 2020, but the norms were 2019, so obviously, this went through the review process. There was a resubmission on October 18th, 2019, after an original submission was received on August 8th, 2019. It was revised on October 25th, 2019. Accepted on April 28th, 2020.

Rosner: I assume this is a standard process; you submit a paper to a legitimate journal, and they say they like it, but we have these issues with it. Fix these issues, and it’s publishable, right?

Jacobsen: Yes. The title of the paper is “Do the Mega and Titan tests yield accurate results? An investigation into two experimental intelligence tests” This is from the Department of Sociology and Social Work at the University of Agder in Norway. The abstract is short. I’ll read it in full. “The Mega and Titan Tests were designed by Ronald K. Hoeflin to make fine distinctions in the intellectual stratosphere. The Mega Test purported to measure above-average adult IQ up to and including scores with a rarity of one in a million of the general population. The Titan Test was billed as being even more difficult than the Mega Test. In this article, these claims are subjected to scrutiny. Both tests are renamed using the normal curve of distribution. It was found that the Mega Test had a higher ceiling and a lower floor than the Titan Test. While the Mega Test may thus seem preferable as a psychometric instrument, it is somewhat marred by a number of easy items in its verbal section. Although official scores reported to test-takers are too high, it is likely that the Mega Test does stretch to the one-in-a-million level. The Titan Test does not. Testees who had previously taken standard intelligence tests achieved average scores of 135–145 IQ on those. Since the mean of all scores on the Mega and Titan Tests was found to be IQ 137 and IQ 138, respectively, testees had considerable scope to find their true level without ceiling effects. Both are unusual and non-standard tests which require a great deal of effort to complete. Nevertheless, they deserve consideration as they represent an inventive experimental method of measuring the very highest levels of human intelligence and have been taken by enough subjects to allow norming.” 

So, he subjects us to proper scrutiny. Ron Hoeflin, after I presented this to Richard May and I think the other editors who may still be the editors of Noesis, the journal of Mega Society, responded to this after that. I don’t know if they showed it to him or if he knew about it before. Regardless, it was published after I had shown it. In the first paragraph of that response by Hoeflin, it says, “I am not a statistician.” So, he’s making the admission that he’s not a statistician, tipping the hat to Redvaldsen in his statistical analysis. That’s an important line in response from Hoeflin recently because this is in the 2020s, and the publication of this paper examining the two tests with, as far as I know the most test takers, although now they’re obviously compromised and cannot be used for admission to the Mega Society, although the power of the tests can be.

[Recording End]

License

In-Sight Publishing by Scott Douglas Jacobsen is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Based on a work at www.in-sightpublishing.com.

Copyright

© Scott Douglas Jacobsen and In-Sight Publishing 2012-Present. Unauthorized use and/or duplication of this material without express and written permission from this site’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Scott Douglas Jacobsen and In-Sight Publishing with appropriate and specific direction to the original content. All interviewees and authors co-copyright their material and may disseminate for their independent purposes.

Leave a Comment

Leave a comment