Conversation with Bob Williams on Public and Professional Definitions of Intelligence, General Intelligence, National Intelligence, Age 16, and Validity and Reliability of Alternative Tests: Retired Nuclear Physicist (2)
Author(s): Scott Douglas Jacobsen
Publication (Outlet/Website): In-Sight: Independent Interview-Based Journal
Publication Date (yyyy/mm/dd): 2020/12/22
Abstract
Bob Williams is a Member of the Triple Nine Society, Mensa International, and the International Society for Philosophical Enquiry. He discusses: intelligence in the public consciousness; consciousness within those who spend more time thinking about it, in professional circles; the scientific constructs; the majority opinion definition of general intelligence; other peripheral, though respected, definitions of general intelligence; most noteworthy and prominent names in psychometrics history; arguments for national intelligences; the form of data gathering on the national intelligences; age 16 as a capstone; tests measure g; scores extrapolated beyond their highest range; and the range of validity and reliability of these alternative tests.
Keywords: Bob Williams, chronometrics, g, general intelligence, intelligence, IQ, psychometrics.
Conversation with Bob Williams on Public and Professional Definitions of Intelligence, General Intelligence, National Intelligence, Age 16, and Validity and Reliability of Alternative Tests: Retired Nuclear Physicist (2)
*Please see the footnotes, bibliography, and citation style listing after the interview.*
Scott Douglas Jacobsen: Let’s talk about the abstraction of concept “Intelligence” first, what, fundamentally, is meant by intelligence in the public consciousness?
Bob Williams[1],[2]*: People inherently understand that some people who are able to do complicated tasks that are beyond the abilities of average people and they are certainly aware of dullness. While the benefits of intelligence are strong as it increases, the consequences of low intelligence are much more serious. Most states have legal definitions of the threshold of retardation–usually IQ 70. Each 5 points or so in the down direction adds limitations to learning ability, learning speed, and the ability to manage personal affairs. One of the most convincing sources of information about what can and cannot be done by the population as a whole, is the National Adult Literacy Survey (NALS). The test is done for the federal government by Educational Testing Service. About 92 million adults (out of 191 million) were functioning in levels 1 or 2, meaning that they could perform only basic and elementary tasks. Most of this reflects low intellectual ability or age related decline.
I think the public understands that bright people do better in school and that they are needed in cognitively demanding careers. The thing they don’t seem to get is that intelligence is not evenly distributed between groups nor within groups. They also grossly overestimate the role of the environment in determining intelligence.
Jacobsen: What is meant by consciousness within those who spend more time thinking about it, in professional circles?
Williams: Intelligence researchers do not study consciousness. I have not encountered any casual discussions of it. Scientists (including social sciences) like to measure things, analyze measurements, and construct models that are able to predict other things. Consciousness doesn’t lend itself to such treatment, so it falls into the abstract world of philosophy. Most people seem to regard consciousness as sentience or as self-awareness. A few animal studies have reported various experiments that may test some aspects of self-awareness, such as the mirror test. So far, such tests are yes/no outcomes with little that can be modeled or analyzed.
Jacobsen: Now, to the scientific constructs, e.g., general intelligence, what is meant by general intelligence?
Williams: General intelligence, g, is the common resource that is involved in all cognitive tasks. Jensen described g as a distillate, in the sense that it is the thing that remains when the less essential factors are eliminated. At the psychometric level, g is unitary; at the neurological level, it is not. Charles Spearman found that when he tested people on unrelated tasks, the people who did well on one task were likely to do well on all tasks and vice versa. He called this finding the positive manifold. In the process of devising ways to analyze data, he invented factor analysis and from that, he was able to discover g in 1904.
The public is generally unaware of g and its central importance to the understanding of intelligence. Unfortunately, g is not the kind of thing that people study. It, as with everything we know about intelligence, is a statistical parameter and is a latent trait. We can determine g for a group of people by using a hierarchical factor analysis or other methods (bifactor analysis or principal components analysis). Each method has its advantages in certain applications, but the differences in results are insignificant.
Jacobsen: What is the majority opinion definition of general intelligence?
Williams: Within cognitive science, I think virtually everyone has accepted that intelligence is well represented by g. Today essentially all intelligence research is related to g. The easy way out of definitions is to skip “intelligence” entirely and simply discuss g. If we get into the definition of intelligence, we have many definitions from psychologists over the past century. I will give you two of them. My favorite is from Carl Bereiter: “Intelligence is what you use when you don’t know what to do.” This is a surprisingly accurate, concise, and elegant definition. The second definition is the one used by Linda Gottfredson: “Intelligence is a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test-taking smarts. Rather, it reflects a broader and deeper capability for comprehending our surroundings–‘catching on,’ ‘making sense’ of things, or ‘figuring out’ what to do.” [Linda Gottfredson – Mainstream Science on Intelligence; The Wall Street Journal; December 13, 1994] This definition is the one most often cited since 1994.
{My answer (above) is based on what I think you were asking. It turns out that “general intelligence” is commonly used in reference to g, which we have discussed in various ways.}
Jacobsen: What are some other peripheral, though respected, definitions of general intelligence?
Williams: Most of the definitions that are credible are similar, as one would expect. If they are respected by cognitive scientists, they must address the things we all see and understand in connection with the word. Here are a few, that are worthwhile:
“Individuals differ from one another in their ability to understand complex ideas, to adapt effectively to the environment, to learn from experience, to engage in various forms of reasoning, to overcome obstacles by taking thought.” American Psychological Association
“. . . that facet of mind underlying our capacity to think, to solve novel problems, to reason and to have knowledge of the world.” M. Anderson
“. . . the resultant of the process of acquiring, storing in memory, retrieving, combining, comparing, and using in new contexts information and conceptual skills.” Humphreys
“The ability to carry on abstract thinking.” L. M. Terman
Jacobsen: Who are the most noteworthy and prominent names in psychometrics history who
studied general intelligence as a career?
Williams: Given the long history of the study of intelligence, we could name many people who have contributed to our present day understanding. Progress and activity level in cognitive science has followed a curve that increased slowly at first, then turned upward as rapid advances came from brain imaging and genetics (all made possible by advanced computer technology). I will list a few of the early names, then those whom I know personally who have made major contributions.
The first person who studied intelligence, made measurements, and wrote about his findings was Sir Francis Galton. He is clearly the father of cognitive science. People naturally think of Alfred Binet and Lewis Terman as important figures because of their contributions to the development of testing. Terman also famously conducted a longitudinal study of high IQ cohorts (called Termites).
Charles Spearman was one of the most important and possibly THE most important of all intelligence researchers. He invented statistical methods that were needed to study intelligence (now used widely in other fields), discovered g, invented the first matrix test (developed and carried to the market by his student John C. Raven), and produced a range of insightful observations which remain accurate today.
William Stern deserves mention because he was the originator of the ratio method of determining IQ. The method left us with a test name (IQ) and showed that intelligence could be graded as a function of age and performance.
David Wechsler rescued us from the limited usefulness of the ratio method by introducing the deviation quotient that is now the standard for IQ measurement. He is also known for the Wechsler set of IQ tests, which remain as the most important of all cognitive tests.
Arthur Jensen was clearly the most important researcher in the second half of the 20th century. He convinced his peers that g theory was the only correct basis for understanding intelligence; today that reality permeates intelligence research. Jensen was centrally involved in the study of chronometrics for measuring and studying intelligence. He was a prolific writer of books and papers (totaling approximately 400), many of them remaining as the standards of understanding specific topics today. Two were of particular importance: Bias in mental testing (1980) and The g Factor (1998). I am grateful that I had the opportunity to meet him and have numerous conversations with him at ISIR conferences. The first time I met him was in 2004. He asked me about my interests and I told him that I was particularly interested in the biological foundations of intelligence. He said he had some papers that would interest me and asked that I write my address. Within a week, I received a large envelope stuffed with these papers.
Thomas Bouchard was the founder of the Minnesota Twins Study, which was a huge breakthrough in the understanding of the high heritability of intelligence. He was particularly patient with me when I asked endless questions at the conferences. His graduate students are central figures in cognitive science today.
Richard Lynn led the way in understanding the evolution of intelligence and (later) its slow decline. He displayed the strength of Jensen and a handful of others who dared to study race differences and sex differences. He was the first to study national level intelligence and demonstrated that it was responsible for the wealth of nations (except where there is natural resource wealth, such as oil). This work led to many researchers vastly expanding the amount of national level data collected and who showed the extensive number of parameters that are influenced by it.
Brain imaging was started by Richard Haier, when he first applied positron emission tomography to study glucose uptake rates as a function of intelligence. This led to the brain efficiency hypothesis which has been repeatedly confirmed by various other forms of measurement. Haier and Rex Jung simultaneously discovered the intelligence centers of the brain, then joined forces to produce the P-FIT model that is the standard (so far) neurological model. Jung also investigated creativity with brain imaging and revealed important brain characteristics that relate to it.
Jacobsen: How does this construct g, more precisely, map onto arguments for national intelligences?
Williams: As mentioned above, Richard Lynn opened the door to national intelligence studies. His book IQ and the Wealth of Nations showed a strong correlation between mean national IQ and national wealth and productivity. In this case, the difference between IQ and g doesn’t really matter because only the most powerful predictor (g) is active, even when the discussions use IQ, because the non-g factors are lost via cancellation when very large populations are studied. Now that we have national and regional level data pouring in in from all over the world, we can see that the geographic effects exist within nations. McDaniel an others have shown that US states show the same relationships between IQ and wealth as do nations. Today we have detailed IQ data on a regional basis for many nations, including the US, China, Japan, Italy, India, Vietnam, etc. With the exception of India, IQ generally increases from south to north within nations in the northern hemisphere. These nations also show the regional relation to IQ and per capita income.
The g construct is usually thought of as the three stratum model with g at stratum III, broad abilities at stratum II, and narrow abilities at stratum I. If you look at stratum II, you can divide the broad abilities into g and non-g parts. The g parts define stratum I and the non-g parts are residuals that have little predictive validity (except possibly in the right tail). In national level studies the residuals are lost or minimized due to their randomness. We can, however, see high spatial abilities in East Asians, accompanied by low verbal abilities. These differences are large enough to have consequences.
Jacobsen: What is the form of data gathering on the national intelligences to make them more legitimate or less legitimate depending on the form of interpretation of the analysis?
Williams: It is important to convert all test data to a single standard before attempting to compare them. Richard Lynn developed the means to do this with the Greenwich IQ Standard. It basically uses white British as the standard, so all tests scores are compared as if they were normed against the same group.
One of the early criticisms of Lynn’s work was that (at that time) there were relatively few studies and many of them were convenience studies that were random and were reported by many researchers. The criticisms may have seemed sound to those making them, but now that we have a large amount of data, the results have not changed much, other than to show strong consistency. Another criticism was that Lynn estimated the IQs of some nations by using measured IQs of neighboring nations. Some critics were very critical of this estimation. After data was collected, the estimates turned out to be surprisingly accurate.
Jacobsen: With age 16 as a capstone, what is the degree of difference in the variability between males and females at that age? Is this played out differentially in terms of self-identification in sociocultural constructs of the self seen in gender, often confused with biological and genetic sex differentiation?
Williams: I haven’t seen data showing differences in variability as a function of age, but with respect to intelligence, males appear to reach their advantage at the mean (4-6 points) around age 16. The difference in standard deviation between the sexes is 5 to 15% (males higher). In real world outcomes (the things we use as measures of external validity) males dominate a grossly disproportionate number of cognitive arenas. In Charles Murray’s book Human Accomplishment: The Pursuit of Excellence in the Arts and Sciences 800 B.C. to 1950, he was largely measuring eminence. Of the 4,002 people he reported over that time frame, only 2% were women. Of course, much of that can be attributed to limited opportunity for women, so resolution of the cause is difficult. Side story… At the ISIR conference in 2006, we discussed sex differences in intelligence in an open session. Jensen believed that there was no difference, but his friend Helmuth Nyborg had been trying to show him the reality of it for some time. Anyway, Jensen made the observation that on any credible list of the top 100 composers, there would not be a single woman listed. He often commented on music in relation to various topics, as he considered becoming a professional musician (clarinet).
Unfortunately, I cannot comment on self-identification, as it is something that is studied and debated in different circles. There has, however, been excellent work on outlooks and preferences as a function of sex. The best of this is from the Longitudinal Study of Mathematically Precocious Youth. The limitation of this study is that it applies to very bright cohorts in the 99th percentile, although some of the findings have been reported for less restricted range data sets. Among the things they found were that women showed a marked preference for jobs involving fewer hours of work per week; and they placed a significantly higher value on family, social involvement, community service, friendships, and giving back to the community.
Besides life preferences, there are differences in brain structures, brain activity, and connectivity that differ by sex to such an extent that when correlations are computed for activity involving specific volumes of the brain, the correlation coefficients sometimes have opposite signs for male and female. One interesting comparison that was made involved male and female subjects solving the same math problem. The male and female participants were matched for IQ. Males used the frontal and parietal lobes for solving the problem and females used only the frontal lobe.
These are just examples of the rather large number of sex differences that brain imaging has shown.
Jacobsen: What tests measure g the best? What are the ranges of those tests with standard deviations?
Williams: The most heavily g loaded tests are clearly the best, since the whole reason we can use IQ tests is that they are sufficiently g saturated that they can be used as proxies for g. In recent years, researchers have been urging the use of comprehensive tests, such as the WAIS or Woodcock-Johnson, because they do a better job. It also happens that these two tests can report g at the individual level.
Gilles Gignac and Timothy Bates did a study on the correlation between brain volume and test quality. They showed that the correlation increases as test quality increases. [see Intelligence 64 (2017) 18–29] This is expected because g reflects the biology (structure and global properties) of the brain. From their paper, here are the things they identified as determining test quality (examples of “excellent” given on the right):
number of subtests 9+
dimensions 3+ (e.g., fluid intelligence, crystallised intelligence, processing speed)
testing time 40+ minutes
correlation with g $ 0.95
In the past, researchers were often inclined to accept Spearman’s indifference of the indicator in situations that would draw criticism today. Spearman was (as usual) right, but only in a general sense. It is certainly true that a single dimension test, such as the Raven’s Progressive Matrices can give a good measure of intelligence, but even that popular test has received some criticism for having a lower g loading than the comprehensive tests (and lower than some prior claims) and for the presence of factors (as can be found in a factor analysis) that are not reported. At one time, researchers sometimes took the RPM score as a g score.
[The indifference of the indicator is based on the fact that every correlation with g is with the same g. So a vocabulary test can be used to estimate (quite well) g as can a test of analogies. Both of these give us a good estimate of the same g. There is, however, a greater fidelity when multiple measures are used, particularly in an omnibus test.]
The reason for emphasis on comprehensive tests is that they examine more of the relatively few stratum II factors. Examining more broad abilities gives a more complete picture. You can imagine trying to make out the image in a puzzle; it is better defined when more pieces are in place than with fewer.
Jacobsen: How are these scores extrapolated beyond their highest range for some individuals who claim more than 4-sigma scores on these mainstream intelligence tests?
Williams: Of professional IQ tests, I don’t know the procedures used, but I can tell you the claimed ceilings of a few. The WISC-V added extended range in 2019 and claims a ceiling of 210. The DAS claims 175. I assume that the extrapolations are simply extensions of the norming data above the range where there are no data points. Naturally, this means an increased measurement error and requires an assumption that the distribution remains Gaussian in that range (I think that an argument can be made that this is has not been demonstrated).
Hobby tests have claimed very high ceilings, but they have not established a valid support for the claimed ranges. I have read a few of the arguments used to explain their norming and have not seen anything I believe would withstand close scrutiny. There are so many deficiencies associated with hobby test designs, in addition to norming, that I think they should be considered as for entertainment only. I know there are some people who will disagree, but they have not come forth with sound support for the tests. If the tests are not used by clinical psychologists or intelligence researchers (as shown by their use in scholarly journal papers) I fail to see how they can be considered as meaningful measurement instruments.
Jacobsen: What is the range of validity and reliability of these alternative tests compared to the aforementioned mainstream intelligence tests?
Williams: For alternate tests, the disclosures vary from no mention to numbers that reflect an attempt to make some measurements, but which do not result in a full presentation of the things a real test must demonstrate: a high reliability coefficient; norming data (including group size and selection criteria) and method that is appropriate to the claimed ceiling; a predictive validity that is supported by meaningful external measurements; a demonstration of construct validity; a clear standard deviation of 15, or a proper conversion to 15 in the reporting of the score; measurement of at least three broad abilities; identification of a properly determined g loading for the test, where that loading is near or above 0.80; demonstrated invariance by population group, age, and sex (or exclusion of groups where invariance has not been shown); age corrected scoring; citations in the peer reviewed scholarly literature; and demonstrated use by professionals.
Of these, the demonstration of external (predictive) validity is the most important. If the scores do not predict differences in real life outcomes, they are meaningless. Take a hypothetical score of 160 and one of 190 by the same test. This huge, 2 standard deviation difference should produce large differences in external measures, such as the probability of earning a PhD, income, wealth, number of scholarly papers published, number of books published, probabilities of receiving world class honors (for example, those received by Richard Feynman: Putnam Fellow · Nobel Prize in a science · Albert Einstein Award · Oersted Medal · National Medal of Science for Physical Science · Foreign Member of the Royal Society), patents awarded, corporations founded, major accomplishments (think of Musk, Gates, and Zuckerberg), etc. If there is not a difference in such external measures, there is no reason to believe that the test scores have meaning.
Appendix I: Footnotes
[1] Retired Nuclear Physicist.
[2] Individual Publication Date: December 22, 2020: http://www.in-sightjournal.com/williams-2; Full Issue Publication Date: January 1, 2021: https://in-sightjournal.com/insight-issues/.
*High range testing (HRT) should be taken with honest skepticism grounded in the limited empirical development of the field at present, even in spite of honest and sincere efforts. If a higher general intelligence score, then the greater the variability in, and margin of error in, the general intelligence scores because of the greater rarity in the population.
License
In-Sight Publishing by Scott Douglas Jacobsen is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Based on a work at www.in-sightpublishing.com.
Copyright
© Scott Douglas Jacobsen and In-Sight Publishing 2012-Present. Unauthorized use and/or duplication of this material without express and written permission from this site’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Scott Douglas Jacobsen and In-Sight Publishing with appropriate and specific direction to the original content. All interviewees and authors co-copyright their material and may disseminate for their independent purposes.