Skip to content

On High-Range Test Construction 27: Bob Williams, The Flynn Effect: A testing phenomenon, not psychometric g

2024-11-22

 

 

 

 

 

 

 

Publisher: In-Sight Publishing

Publisher Founding: March 1, 2014

Web Domain: http://www.in-sightpublishing.com

Location: Fort Langley, Township of Langley, British Columbia, Canada

Journal: In-Sight: Independent Interview-Based Journal

Journal Founding: August 2, 2012

Frequency: Three (3) Times Per Year

Review Status: Non-Peer-Reviewed

Access: Electronic/Digital & Open Access

Fees: None (Free)

Volume Numbering: 13

Issue Numbering: 1

Section: E

Theme Type: Idea

Theme Premise: “Outliers and Outsiders”

Theme Part: 32

Formal Sub-Theme: High-Range Test Construction

Individual Publication Date: November 22, 2024

Issue Publication Date: January 1, 2025

Author(s): Bob Williams

Author(s) Bio: Bob Williams is a Member of the Triple Nine Society, Mensa International, and the International Society for Philosophical Enquiry.

Word Count: 5,114

Image Credits: Photo by Nicoledit on Unsplash.

International Standard Serial Number (ISSN): 2369-6885

*Original authorship December, 2021.*

*Please see the footnotes, bibliography, and citations, after the publication.*

Abstract

The Flynn Effect (FE), characterized by consistent increases in IQ test scores over time, has been observed globally but varies significantly across nations and demographics. Initial studies highlighted these gains, with later research attributing them to environmental, behavioral, and methodological factors rather than changes in general intelligence (g). Notably, FE gains are higher in fluid intelligence measures than crystallized ones, vary by age and test type, and sometimes reverse, as seen in several developed nations. These reversals point to the saturation and decline of positive factors, coupled with the influence of negative causes such as dysgenic fertility. Analyses suggest the FE operates on non-g factors, with minimal evidence linking it to actual intelligence improvements. Methodological artifacts, including test-taking behaviors and scoring techniques, contribute significantly to the observed gains. Future research, leveraging genetic markers and polygenic scores, may further elucidate the complex interplay of factors underlying the FE’s variability and reversals.

Keywords: Dysgenic fertility, environmental factors, fluid intelligence, Flynn Effect, general intelligence, IQ score gains, test artifacts.

On High-Range Test Construction 27: Bob Williams, The Flynn Effect: A testing phenomenon, not psychometric g

Background 

The thing we now call the Flynn Effect was initially discovered by researchers in the 1940s as an increase in IQ test scores. Papers reporting such gains were published by Smith (1942), Tuddenham (1948), Lynn (1982), and Flynn (1984). This effect did not have a name until the publication of The Bell Curve in 1994. Herrnstein and Murray named it the Flynn Effect (see page 307). Subsequently, researchers began to look at the effect and have since published a huge number of papers that attempt to make sense of what is happening. They found that gains were large enough to be of concern. If real, they suggest a large change in intelligence; but if not real, they at least reveal an instability in IQ tests. 

Examples of IQ test score increases per decade: U.S. 3.0 points; Japan 7.7 points; and Argentina 6.9 points. Imagine a 50 year span… these gains would amount to over two over standard deviations (a very large difference). James Flynn initially noted that these gains are so large that it would mean that the average IQ in the United States in 1918 would have been 75, if scored against the norms at the time of his writing. Various similar observations (including Dutch data) showed that the gains are unlikely to be real, yet when the public and pop science magazines heard about the effect, they assured us that mankind was becoming brilliant. Clearly that was not happening, but even some researchers began to suggest that people were getting smarter. 

One early question was whether the FE was real? In terms of something that can be shown to be related to another intelligence related measure, is the effect more than random differences in data? Rushton used principal components analysis to look at gains on the WISC-R and WISC-III and found a cluster, meaning that the gains were a reliable phenomenon. The cluster was independent of the cluster formed by breeding group differences, inbreeding depression and g loadings, which tells us that the gains are not a Jensen Effect (meaning that they are not g loaded). Other researchers showed a similar result by using the Method of Correlated Vectors. [01] 

Today it is common for researchers to accept that the FE is a score gain of about 3 points per decade. But when we look at the changes in scores on a nation by nation basis, we find gains that are much higher and much lower. This tells us that whatever is causing the changes in one place is acting differently or is due 

to a totally different cause from that in a place where the changes are quite different in magnitude. We have negative FEs (reversal) in at least seven nations–again pointing to different causes or different stages of individual causes. 

The message that will emerge from this discussion is that the FE is not a single thing, but is the sum of many parts that vary over time and place. If we compute a FE in one nation it will be different in both magnitude and characteristics from a similar computation in a different nation at the same time. But if we compute it in one nation at one time and then compute it again in the same nation but at a different time, the result can be different in magnitude, sign, and component causes.

Characteristics 

To get an idea of how inconsistent the FE has been let’s examine how it has played out in different studies: 

  • Gains mostly in the low IQ range; gains mostly in the high IQ range; gains uniform for the full IQ distribution. 
  • The effect is seen in preschool children; some papers argue that the FE is caused by education.  • Different age groups, within the same study, can show different FEs. 
  • Gains using the same test are higher for measures of fluid intelligence and lower for crystallized intelligence. 
  • Different tests give different FE changes. Many references point to the Raven’s Progressive Matrices (RPM) test as showing the largest gains. 
  • When FE gains have been tested for invariance, the result is consistently that invariance is not supported between age cohorts. This importantly means that IQ tests operate differently for different age groups. 
  • Some component measures, such as spatial ability show gains, while others, such as vocabulary show losses. 
  • Gains seem to be in ability differentiation, not in g. [02] 
  • Some studies (Northern and Central Europe) show a significant sex difference, with larger FE for women than men. 
  • The FE is regional, showing larger gains in regions experiencing rapid development. • Within individual nations, some show rapid FE gains, then slow gains, then no gains, then a reversal (IQ scores declining). 

Reversal 

It is important to consider the cases in which large FEs declined and then reversed over a period of years.  The reversals are difficult to explain by most of the causes that are otherwise plausible. FE gains have turned into losses in Norway, Denmark, Britain, Netherlands, Finland, France, Estonia. None of these nations show parallel effects that might relate to declining nutrition, physical traits (height), education, less complex environmental stimulation, etc. Woodley et al. reported a literature search that identified reported negative FEs in 13 nations. This study reported more rapid FE declines when less g loaded tests were used and identified immigration from low IQ nations as contributing to the net IQ decline. 

How can the negative FE be explained? The answer lies in the FE consisting of numerous causes, with varying effect sizes and different saturation points. These effects reach their maximum effect and then cease to cause changes (up or down). When the causes that increase test scores decline to insignificant 

levels we are left with negative causes that are still active. One known negative cause is dysgenic fertility (bright people having fewer children than dull people). This effect seems to be continuing at a slow, but steady rate in developed nations. The dysgenic effect will be discussed after the positive causes are considered (below). 

Some researchers have found that the negative FE is even larger than the positive FE. Pietschnig & Gittler found a 4.8 point per decade decline in German-speaking nations. They attribute the reversal to saturation of positive FE factors. Dutton & Lynn found a 3.8 point decline in France over ten years. Platt, et al. reported a large U.S. study that showed a positive FE for IQs above 130 and a negative FE for IQs below 70 (all from the same data). In a separate study, Woodley reported a loss of 4.5 points per decade in the Netherlands. These negative FEs are larger than the often claimed average FE gain of 3.0 points per

decade. 

What causes the FE? 

Various papers have investigated what they describe as THE causes of the FE. If they found some supporting evidence, they have typically presented it without noting that there are obviously many other likely contributors. Some of the things that have been considered as candidate causes: 

  • Education  
  • Decreased family size 
  • Increased exposure to testing 
  • Heterosis 
  • Exposure to artificial light  
  • More complex visual environment 
  • Nutrition and improved health care 
  • Child rearing practices 
  • Abstract reasoning 
  • Speed of test completion 
  • Slower life history speed 
  • Testing artifacts 

Among other potential causes, migration, fertility, and mortality have been investigated and found to not show correlations with the FE. 

Education 

More years of education is supported as a cause in some studies; some researchers argue that it has the largest effect. There are, however, effects that are opposed to this cause. Numerous reports show declines in Gc (crystalized intelligence) and increases in Gf (fluid intelligence). Education should show gains during school years, but some studies have found larger gains among adults. Other studies have found that both Gc and full-scale gains were negligible, while Gf shows gains. This is opposite of what would be expected from education driven gains. As previously noted, the FE has been shown for preschool children. They have shown IQ gains of 3.9 points per decade (higher than the often stated average FE gain of 3 points per decade. This range of different findings is typical of attempts to verify specific causes. 

Decrease in family size 

Smaller family sizes would cause a gain in mean scores because it would disproportionately remove more people with slightly lower IQs and retain those with higher IQs due to the birth order effect. The (related) well established negative correlation between IQ and fertility rate is the focus of study for the decline in g that has been studied extensively. In Iceland polygenic scores [03] were used to predict educational 

achievement and showed a negative correlation with Icelandic and US data. This cause is convincingly established and points to a decline that is a Jensen Effect. [04] 

Increased exposure to testing 

Arthur Jensen pointed (The g Factor) to increased test-wiseness related to more frequent testing in schools as a non-g factor in increased test scores. One of the most convincing demonstrations of this came from the 72 year range of tests in Estonia. [Olev Must, Jan te Nijenhuis, Aasa Must, & A. van Vianen, (2009). Comparability of IQ scores over time. Intelligence, 37, 25–33.] When I discussed this with Olev Must, he

told me that one of the good outcomes of the communist period was that they never threw any documents away. Hence, they had National Intelligence Test results for this long period. Analysis of the results showed a clear trend of increased guessing (more test items tried and more errors, but also with the expected gains). This effect was predicted by Chris Brand in 1996. He wrote: “The correct strategy for testees is: When in doubt, guess.” Today this testing artifact is known as the Brand Effect. Michael Woodley insightfully noted that gains that had been described as Jensen Effects, based on subtest scores showing more gains on more g loaded test items, could be explained as Brand Effects. The more g loaded subtests are also more difficult and are much more likely to involve increased guessing. 

Heterosis 

Mingroni argued that broadened ranges of breeding (to villages that were far enough away to be outside of the breeding group in consideration) would account for a larger gene pool that could lead to increased intelligence. Since this would be a genetic effect, it should show up (if real) as a gain in g. His explanation was offered with the observation that environmental effects on intelligence are small, [shown by MZ twins reared apart and adoption studies] so there must be something else happening. Of course, there is–testing artifacts, such as the Brand Effect. The heterosis explanation is consistent with secular trends in height, growth rate, myopia, asthma, autism, ADHD, and head circumference. But the effect has not been observed and it is inconsistent with FE gains in Europe before increased immigration. The developmental gains are inconsistent with IQ gains in various nations.  

Exposure to artificial light  

The basis of this suggestion (from Jensen) is that the pineal gland can be stimulated in animals (poultry farms do this), causing faster maturity and increased metabolism. In humans, there is little doubt that we have experienced increased amounts of artificial light from area lighting, computer screens, and television. There is, however, no data reported on this potential effect, so it cannot be accepted until a proper study shows that it is actually linked to the FE. 

More complex visual environment 

There is no doubt but that our environments have become more complex with the development of advanced communications, video streaming, computers, smart phones, and ever increasing automobile features. Some researchers have suggested that these environmental factors have led to changes that contribute to the FE. Armstrong and Woodley reported a significant correlation between rule-dependence and FE gains that mimic the gains seen in retesting (gains on specificity). One obvious appearance of this is in progressive matrices tests, which have been shown to be subject to learning, not only from repeat testing, but also from progressing through the test. Tests such as the Raven’s Progressive Matrices (RPM) show a maximum g loading only when first encountered. This general effect, of learning rule based processes, exists throughout our increasingly complex environment. 

Nutrition and improved pre-natal health care 

It is a virtual certainty that our food and health care (specifically pre-natal) have had direct impact on birth weights, height, and developmental quotients (DQs). Richard Lynn has published several papers showing the rather rapid advances in these physical measures and has implied that they translate into IQ gains. His argument makes sense, particularly in connection with head size, which is positively correlated with skull 

size and brain size. There are many studies showing the positive correlation between brain volume and IQ. When high quality IQ tests are used, this correlation is about r = +0.40. In 2018 researchers determined that the cause of this correlation is lower neurite density, that promotes more efficient neurite orientation, and more complete arborizaion in larger brains. This means that larger brains are more efficient.

The nutrition argument, as with most FE outcomes, has problems. Nutrition, as it relates to vitamins, supplements, etc. have not been shown to improve intelligence in developed nations. [In undeveloped nations insufficient intakes of iron, iodine, and folate have been found to depress intelligence.] The nations presently experiencing a negative FE have not shown nutritional decline. Gains in IQ due to these factors would make sense if they were linked to IQ gains in the lower half of the intelligence spectrum, but the gains in such things as height have been concentrated in the upper half. Flynn argued that height gains were not happening at times when IQ gains were observed. 

The primary supporter of FE gains in this category was Richard Lynn. His papers discuss DQs and the other physiological factors that have been linked to improved nutrition. The strong implication from these papers is that the FE gains he has suggested are gains in g because g is known to be most strongly related to the biological aspects of intelligence. The curious thing is that Lynn has also argued that psychometric g is decreasing due to the dysgenic consequences of high fertility among dull people and low fertility among bright people. [Dysgenics: Genetic deterioration in modern populations] His arguments are vectors pointing in opposite directions. 

Child rearing practices 

The inherent problem with explaining FE gains or losses as the result of child rearing practices is that the FE has been found in essentially every nation that has been examined, despite large differences in child rearing practices. Additionally, adoption studies have shown that adopted children reach adulthood with a zero correlation between their IQs and those of their adoptive parents and adoptive siblings. In short, the shared environment does not impact adult intelligence. [There is a temporary shared environmental variance that vanishes around age 12.] 

Abstract reasoning 

As previously noted, FE gains have been larger in tests of abstract reasoning than on tests of Gc. The RPM has consistently showed a substantial positive FE. When tests are evaluated, the item level difficulty increases as a function of the abstractness of the item. As discussed above, increased difficulty can lead to increased guessing that results in a FE gain. Another result is that when tests are compared over time, the more abstract words show lower miss rates over the time range being evaluated. This result supports Flynn’s interpretation that the FE is driven (at least in part) by increased abstract thinking ability. 

Speed of test completion 

The Brand Effect is the result of increased guessing, but there is another related effect due to the behavioral trend of students taking the tests faster. Younger cohorts work faster. Increased test taking speed results in more test items attempted, more missed and more with correct responses. The change in speed of test taking results in a significant lack of invariance. Shiu et al. showed a 38% difference in item functioning between age groups. Must and Must showed that when invariant test items were examined, there was little or no FE. When speeded items were examined there was a large, positive FE. [Speediness is determined at the subtest level by the fraction of test items that were not attempted.] 

Slower life history speed 

Michael Woodley and various co-authors have argued that the FE is related to slowing life history speed.  This concept is related to environmental gains in safety, food supply, and other survival needs. As living conditions improve, people are inclined to shift their priorities towards such things as education, nutrition, age when first child is born, smaller families, wellbeing, and lifestyles. This model is functionally similar 

to a movement from r-strategy (more offspring and less protection of young) to K-strategy (fewer offspring and significant parental protection). When populations are maturing in favorable survival

conditions, they move from fast to slow life history speed. This shift is accompanied by lower fertility rates, more education, and improved nutrition; all of these could contribute to changes seen in the FE. 

Woodley, noted that life history speed is not a genetic effect, but rather a behavioral change. [Michael Woodley (2012). A life history model of the Lynn–Flynn effect. Personality and Individual Differences, 53(2), 152–156.] In the context of the FE, this is consistent with various demonstrations that the FE is not a Jensen Effect. [04] 

In various places and time spans, it is reasonable to claim that there are changes related to slower life history speed. This is consistent with FE gains and societal behavior. But with at least seven European nations (all highly developed) showing a FE reversal, the life history speed model would presumably have to show a reversal (faster LHS). This reversal has not been evident. 

Testing artifacts 

We have already looked at the Brand Effect (increased guessing) and test taking speediness as causes of the FE. Some researchers were mislead to believe that they were seeing increases in g, when they were actually seeing different rates of guessing as a function of g loading and item difficulty. Another artifact, directly related to IQ tests is the use of classical test theory (CTT) instead of item response theory (IRT).  Most IQ tests are scored using CTT. This method applies equal weight to each test item and simply combines subtest scores to produce an IQ score. One obvious problem with this approach is that it gives equal weight to easy and difficult test items. IRT is based on item level difficulty, as determined by the item characteristic curve. IQ can be determined by establishing the level of item difficulty beyond which guessing is indicated. IRT is understood to be the superior method. 

Beaujean and Osterlind scored the National Longitudinal Survey of Youth data set using both CTT and IRT. Results are shown below: 

Peabody Picture Vocabulary Test-Revised 

CTT FE of 0.44 points per year 

IRT FE of 0.06 points per year 

Peabody Individual Achievement Test-Math 

CTT FE of 0.27 points per year 

IRT FE of 0.13 points per year 

These results do not need explanation. They are substantial and are entirely the result of scoring the same test results using CTT and the superior IRT. 

Another artifact is present in numerous studies; it is that the FE is measured at two different times, using different tests or different revisions of the same test. These differences introduce measurement errors due to different test items being used and practice effects when the same items are used. There is no literature that has sorted out the impact of this category of error, but the qualitative aspects of these are obvious and most likely relate to the inconsistent and confusing outcomes that are common in FE literature.

Are FE changes g loaded? 

Perhaps the most important factor to be established about the FE is its g loading. If it is a change in g (a Jensen Effect), then we would have real increases in intelligence. If it is not g loaded, then the changes are in something else; this could be changes in non-g factors that relate to intelligence or simply artifacts that should be treated as noise. 

Most researchers have tried to determine if the effect they were examining is a Jensen Effect or not.  Almost all have found that it is not g loaded and it is likely that those who claimed a change in g were mistaking a Brand Effect for intelligence gains. As mentioned in the background section, one of the most obvious ways to appreciate that the FE is hollow is to consider the magnitude of changes that have been reported in various nations. Over relatively short spans of time the FE gains have been outrageously large, suggesting that past generations were at the level of retardation as compared to present populations.  Nothing we have seen in real world behavior is consistent with such a massive change in intelligence. 

The only confirmed FE changes have been those associated with environmental effects. We already know that nothing in the environment has been found that actually increases intelligence; ergo, FE gains would not show g variation if they are caused by the environment. At this point in time we can safely say that the primary factors contributing to the FE are environmental (including behavioral). 

An excellent study of the FE and a biological marker was done by Nettelbeck and Wilson in Australia.  Two studies were done at different times (1981 and 2001). The studies were done in the same school and same grade levels using the same test (Peabody Picture Vocabulary Test). They also measured inspection time (IT) on both occasions, using the same Gerbrands tachistoscope. It is an excellent biological intelligence marker. The results showed the predicted FE gains (5 points) over the 20 year period, but the IT results were unchanged. This is exactly what would be expected if the gains were not a Jensen Effect.  I asked Nettlebeck if there were any observable differences in SES or nutrition between the two groups. He said that the area served by the school was stable and that there were no observable differences in such things as nutrition or standard of living. 

Principal components analysis of FE gains (discussed above) showed that there was no overlap between FE gains and purely genetic factors (racial differences and inbreeding depression). Must et al. used the method of correlated vectors [01] to test for g loading and found no g loading. 

Jensen presented a particularly convincing argument that shows another way to demonstrate a lack of g loading in FE changes. He stated that the definitive test of whether FE gains are hollow or not is to apply the predictive bias test. This means that two points in time would be compared on the basis of an external criterion (real world measurement, such as school grades). If the gains are hollow, the later time point would show underprediction, relative to the earlier time. This assumes that the later test has not been renormed. In actual practice tests are periodically renormed so that the mean remains at 100. The result of this recentering is that the tests maintain their predictive validity, indicating that the FE gains are indeed hollow. 

Finally, there has been a dysgenic effect on intelligence in developed nations for the past 100 to 150 years, caused by the negative correlation between intelligence and fertility rate. This effect is shown by measures that load on g (reaction time, vocabulary, color sensitivity, and backward digit span). These measures have shown movement in the direction that indicates lower intelligence. [See At Our Wits’ End: Why We’re Becoming Less Intelligent and What It Means for the Future, by E. A. Dutton & M. A.

Woodley of Menie. Exeter, UK: Imprint Academic.] The rate of decline in g is slow, but its existence means that g is not increasing, since this is a single parameter that cannot show a net gain and loss over the same period of time. [Also see Lynn, R. (2011). Dysgenics: Genetic deterioration in modern populations (revised ed.). London: Ulster Institute for Social Research and Herrnstein, R. J., & Murray, C. (1994). The Bell Curve: Intelligence and Class Structure in American Life. New York: Free Press.] Besides these books there is a large number of scholarly papers also showing a decline in g

Understanding non-g effects 

IQ tests measure variances in g, non-g residuals of broad abilities, and uniqueness (specificity + random error). [Specificity = s, random error = e] The sum of these variances must equal 100%. The FE appears to operate on residuals and uniqueness. Causes such as the Brand Effect, faster test taking, and method of scoring do not change any aspect of cognitive ability, so they are confined to the uniqueness variance.  That leaves a broad number of candidate causes to necessarily appear as increases in non-g parts of broad abilities. 

If an IQ test is given to a large group, then factor analyzed (with hierarchical factor analysis), the result is that factors are shown for numerous narrow abilities; these are usually called Stratum I. When these are factored, they produce a few broad ability factors at Stratum II. The common variance in the Stratum II factors defines g at Stratum III. [Some tests can produce up to 4 stratums and others may produce g at Stratum II.] If g is factored out of Stratum II, the residuals are orthogonal to g. If these are tested for external validity, essentially none is found. The ability of IQ tests to predict important life outcomes is almost entirely the result of the test g variance. 

In the case of FE gains and losses, the test scores are reflecting changes in the variance due to these non-g factors. The presence of the group factors (Stratum II) was known by Spearman and researchers since his discovery of g. Group factors are real abilities, even after g is removed. In that sense, they can and do show up in tests, causing drift up or down as environmental factors are expressed as non-g variance.  These factors have been carefully studied with respect to score changes due to education. Learned material may show up as specificity variance, if the test calls upon such material. Another related cause of s-loading is test familiarity, seen when the same test is re-administered. Gains from familiarity with the test are not gains in intelligence, but can show up as s-loading. 

Future research and polygenic scores 

In The g Factor, Jensen discussed an idea he called an anchor point. This would be a true biological marker of intelligence (g). [The discussion (above) of IT by Nettelbeck and Wilson can be regarded as a comparison of FE gains against an anchor point.] If psychometric scores increased, they could be measured against the anchor to show that they are or are not increases in g. The anchor would not move if the psychometric scores were hollow. If the anchor increased, there would be a gain in real intelligence.  The measurement Jensen suggested was RT (reaction time, a chronometric measure). At this point, it is fairly obvious that the FE gains are hollow, but Jensen’s idea can now be done genetically by recording polygenic scores for groups being studied. If the polygenic scores increase, we have a direct measure of a change in real intelligence. Monitoring polygenic scores would also serve to confirm or disconfirm the decline in g that has been discussed by Dutton / Woodley and Lynn. Given the huge increase in genome data banks, it is inevitable that such data will be used in the future to give excellent indications of real population changes in intelligence.

Conclusions 

  • FE gains and losses are due to an unknown number of small causes that may appear in different combinations at different times or different places. 
  • Gains and losses are not Jensen Effects and as such do not represent changes in real intelligence. • Reversal happens when negative causes (lowering intelligence) are larger than those causing gains.  This happens when the causal effects reach saturation. 
  • Causes of the test score instability are associated with the environment and with test artifacts. 

Notes 

[01] The method of correlated vectors is used to determine wether an external variable is related to g. It is a somewhat complex method that is fully explained in Appendix B of Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger. The basic process is to create a column vector from the g loadings of subtests and then correlate that with a vector that consists of measurements of a factor that is external to the test. If there is a positive correlation, then the external variable is g loaded. There have been papers that challenge the merits of this method as valid for all situations. It is, however, widely used to demonstrate that various measures are related to g

[02] Broad abilities (typically Stratum II factors in a Cattell-Horn-Carroll test) can be divided into g and non-g parts. In determining the g loading of a test, g is the common element in the Stratum II factors. If g is factored out of the Stratum II factors, the non-g parts can be identified as residuals of each broad ability.  

These residuals are real abilities, but typically show little, if any, predictive validity when tested independently from g. At high levels of intelligence, Charles Spearman (who invented factor analysis and discovered g) contended that the differentiation between cognitive abilities shifts towards increased importance of the non-g (residuals). This is known as Spearman’s Law of Diminishing Returns and remains in dispute because it is vexingly difficult to prove. The use of “ability differentiation” in the document is a reference to the non-g broad abilities. 

[03] Polygenic scores – The success of genome wide association studies resulted in the initial identification of 1,271 single nucleotide polymorphisms associated with intelligence. These variants have been used to create polygenic scores, which can be used to measure IQ from the number of these that are present in the DNA of a given person. See: Using DNA to predict intelligence; Sophie von Stumm, Robert Plomin; Intelligence 86 (2021) 101530. Also see: Robert Plomin – Blueprint: How DNA Makes Us Who We Are, Penguin Books Ltd., 2018, ISBN 9780241282076. 

[04] Jensen Effect – An effect that is related to g is considered to be a Jensen Effect. Because g can be used as the very definition of intelligence, a Jensen Effect means that the thing being observed is related to real biological intelligence, and not to an artifact or factor that is not g loaded.

Footnotes

None

Citations

American Medical Association (AMA 11th Edition): Williams B. On High-Range Test Construction 27: Bob Williams, The Flynn Effect: A testing phenomenon, not psychometric g. November 2024; 13(1). http://www.in-sightpublishing.com/high-range-27

American Psychological Association (APA 7th Edition): Williams, B. (2024, November 22). ‘On High-Range Test Construction 27: Bob Williams, The Flynn Effect: A testing phenomenon, not psychometric g’. In-Sight Publishing. 13(1).

Brazilian National Standards (ABNT): WILLIAMS, B. On High-Range Test Construction 27: Bob Williams, The Flynn Effect: A testing phenomenon, not psychometric g’. In-Sight: Independent Interview-Based Journal, Fort Langley, v. 13, n. 1, 2024.

Chicago/Turabian, Author-Date (17th Edition): Williams, B. 2024. “On High-Range Test Construction 27: Bob Williams, The Flynn Effect: A testing phenomenon, not psychometric g’.” In-Sight: Independent Interview-Based Journal 13, no. 1 (Winter). http://www.in-sightpublishing.com/high-range-27.

Chicago/Turabian, Notes & Bibliography (17th Edition): Williams, B. “On High-Range Test Construction 27: Bob Williams, The Flynn Effect: A testing phenomenon, not psychometric g.” In-Sight: Independent Interview-Based Journal 13, no. 1 (November 2024). http://www.in-sightpublishing.com/high-range-27.

Harvard: Williams, B. (2024) ‘On High-Range Test Construction 27: Bob Williams, The Flynn Effect: A testing phenomenon, not psychometric g’, In-Sight: Independent Interview-Based Journal, 13(1). http://www.in-sightpublishing.com/high-range-27.

Harvard (Australian): Williams, B 2024, ‘On High-Range Test Construction 27: Bob Williams, The Flynn Effect: A testing phenomenon, not psychometric g’, In-Sight: Independent Interview-Based Journal, vol. 13, no. 1, http://www.in-sightpublishing.com/high-range-27.

Modern Language Association (MLA, 9th Edition): Williams, Bob. “On High-Range Test Construction 27: Bob Williams, The Flynn Effect: A testing phenomenon, not psychometric g.” In-Sight: Independent Interview-Based Journal, vo.13, no. 1, 2024, http://www.in-sightpublishing.com/high-range-27.

Vancouver/ICMJE: Williams B. On High-Range Test Construction 27: Bob Williams, The Flynn Effect: A testing phenomenon, not psychometric g  [Internet]. 2024 Nov; 13(1). Available from: http://www.in-sightpublishing.com/high-range-27.

License & Copyright

In-Sight Publishing by Scott Douglas Jacobsen is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. ©Scott Douglas Jacobsen and In-Sight Publishing 2012-Present. Unauthorized use or duplication of material without express permission from Scott Douglas Jacobsen strictly prohibited, excerpts and links must use full credit to Scott Douglas Jacobsen and In-Sight Publishing with direction to the original content.

6 Comments
  1. Ryan's avatar
    Ryan permalink

    I’ve always found the Flynn Effect intriguing, especially how it varies so much between nations and over time. The point about test artifacts like the Brand Effect is something I hadn’t thought about before—it’s surprising how much guessing strategies could play a role.I was curious about the part mentioning polygenic scores as a way to anchor intelligence measures. It sounds promising, but how do researchers balance that with the influence of environmental factors? It feels like there’s still a lot of overlap that’s tough to untangle.

    Like

    • Bob Williams's avatar
      Bob Williams permalink

      IQ is 85% heritable, when measured in adults.  Psychometric g is up to 91% heritable.

      See Haier, R. J. (2017). The Neuroscience of Intelligence, Cambridge University Press and Is there a dysgenic secular trend towards slowing simple reaction time? Responding to a quartet of critical commentaries; Michael A. Woodley, Jan te Nijenhuis, Raegan Murphy; Intelligence 46 (2014) 131–147.

      That leaves little room for environmental effects.  So far, all such effects have turned out to be negative (lower intelligence).  After age 12, there is no C variance, leaving a small E variance to lower intelligence.

      If you are not familiar with the ACE model, this will help:

      A = additive polygenic
      C = shared environment (family)
      E = nonshared environment

      Heritability is around 40% in very young children, but rises until adulthood.  During the rise (known as the Wilson Effect), the C variance vanishes around age 12.

      Polygenic scores are amazingly accurate, when used with very large groups, such as a national genomic dataset.  Davide Piffer did this and showed a strong result:

      A review of intelligence GWAS hits: Their relationship to country IQ and the issue of spatial autocorrelation; Davide Piffer; Intelligence 53 (2015) 43–50:

      The average between-population frequency (polygenic score) of nine alleles positively and significantly associated with intelligence is strongly correlated to country-level IQ (r = .91). Factor analysis of allele frequencies furthermore identified a metagene with a similar correlation to country IQ (r = .86). The majority of the alleles (seven out of nine) loaded positively on this metagene.

      So far, 1,271 SNPs have been identified as associated with high intelligence.  The estimated number of SNPs that determine intelligence is 10,000 to 40,000.  The average effect size of the identified SNPs is 0.01%.  So, we either have to hope for a lot more SNPs to be added to the number used for these polygenic scores, or there would need to be a way to do the anchoring calculation with a very large N.  It might be possible, even with the number of SNPs in use today, if they are sufficiently indicative of change.

      Like

    • Bob Williams's avatar
      Bob Williams permalink

      IQ is 85% heritable, when measured in adults.  Psychometric g is up to 91% heritable.

      See Haier, R. J. (2017). The Neuroscience of Intelligence, Cambridge University Press and Is there a dysgenic secular trend towards slowing simple reaction time? Responding to a quartet of critical commentaries; Michael A. Woodley, Jan te Nijenhuis, Raegan Murphy; Intelligence 46 (2014) 131–147.

      That leaves little room for environmental effects.  So far, all such effects have turned out to be negative (lower intelligence).  After age 12, there is no C variance, leaving a small E variance to lower intelligence.

      If you are not familiar with the ACE model, this will help:

      A = additive polygenic
      C = shared environment (family)
      E = nonshared environment

      Heritability is around 40% in very young children, but rises until adulthood.  During the rise (known as the Wilson Effect), the C variance vanishes around age 12.

      Polygenic scores are amazingly accurate, when used with very large groups, such as a national genomic dataset.  Davide Piffer did this and showed a strong result:

      A review of intelligence GWAS hits: Their relationship to country IQ and the issue of spatial autocorrelation; Davide Piffer; Intelligence 53 (2015) 43–50:

      “The average between-population frequency (polygenic score) of nine alleles positively and significantly associated with intelligence is strongly correlated to country-level IQ (r = .91). Factor analysis of allele frequencies furthermore identified a metagene with a similar correlation to country IQ (r = .86). The majority of the alleles (seven out of nine) loaded positively on this metagene.”

      So far, 1,271 SNPs have been identified as associated with high intelligence.  The estimated number of SNPs that determine intelligence is 10,000 to 40,000.  The average effect size of the identified SNPs is 0.01%.  So, we either have to hope for a lot more SNPs to be added to the number used for these polygenic scores, or there would need to be a way to do the anchoring calculation with a very large N.  It might be possible, even with the number of SNPs in use today, if they are sufficiently indicative of change.

      Like

    • Bob Williams's avatar
      Bob Williams permalink

      IQ is 85% heritable, when measured in adults.  Psychometric g is up to 91% heritable.

      See Haier, R. J. (2017). The Neuroscience of Intelligence, Cambridge University Press and Is there a dysgenic secular trend towards slowing simple reaction time? Responding to a quartet of critical commentaries; Michael A. Woodley, Jan te Nijenhuis, Raegan Murphy; Intelligence 46 (2014) 131–147.

      That leaves little room for environmental effects.  So far, all such effects have turned out to be negative (lower intelligence).  After age 12, there is no C variance, leaving a small E variance to lower intelligence.

      If you are not familiar with the ACE model, this will help:

      A = additive polygenic
      C = shared environment (family)
      E = nonshared environment

      Heritability is around 40% in very young children, but rises until adulthood.  During the rise (known as the Wilson Effect), the C variance vanishes around age 12.

      Polygenic scores are amazingly accurate, when used with very large groups, such as a national genomic dataset.  Davide Piffer did this and showed a strong result:

      A review of intelligence GWAS hits: Their relationship to country IQ and the issue of spatial autocorrelation; Davide Piffer; Intelligence 53 (2015) 43–50:

      “The average between-population frequency (polygenic score) of nine alleles positively and significantly associated with intelligence is strongly correlated to country-level IQ (r = .91). Factor analysis of allele frequencies furthermore identified a metagene with a similar correlation to country IQ (r = .86). The majority of the alleles (seven out of nine) loaded positively on this metagene.”

      So far, 1,271 SNPs have been identified as associated with high intelligence.  The estimated number of SNPs that determine intelligence is 10,000 to 40,000.  The average effect size of the identified SNPs is 0.01%.  So, we either have to hope for a lot more SNPs to be added to the number used for these polygenic scores, or there would need to be a way to do the anchoring calculation with a very large N.  It might be possible, even with the number of SNPs in use today, if they are sufficiently indicative of change.

      Like

  2. Bob Williams's avatar
    Bob Williams permalink

    I tried to send you a reply three times. I don’t see it repeated here, so I don’t know if you received it or not. If you want a reply, you can email me at:

    voltan@gmail.com

    Like

    • Scott Douglas Jacobsen's avatar

      Bob had trouble uploading his longer response. Here it is:

      IQ is 85% heritable, when measured in adults. Psychometric g is up to 91% heritable.
      See Haier, R. J. (2017). The Neuroscience of Intelligence, Cambridge University Press and Is there a dysgenic secular trend towards slowing simple reaction time? Responding to a quartet of critical commentaries; Michael A. Woodley, Jan te Nijenhuis, Raegan Murphy; Intelligence 46 (2014) 131–147.

      That leaves little room for environmental effects. So far, all such effects have turned out to be negative (lower intelligence). After age 12, there is no C variance, leaving a small E variance to lower intelligence.

      If you are not familiar with the ACE model, this will help:
      A = additive polygenic
      C = shared environment (family)
      E = nonshared environment

      Heritability is around 40% in very young children, but rises until adulthood. During the rise (known as the Wilson Effect), the C variance vanishes around age 12.

      [Bob’s image was unable to be loaded, but is a labelled as a chart of the Wilson Effect.]

      Polygenic scores are amazingly accurate, when used with very large groups, such as a national genomic dataset. Davide Piffer did this and showed a strong result:

      A review of intelligence GWAS hits: Their relationship to country IQ and the issue of spatial autocorrelation; Davide Piffer; Intelligence 53 (2015) 43–50:

      The average between-population frequency (polygenic score) of nine alleles positively and significantly associated with intelligence is strongly correlated to country-level IQ (r = .91). Factor analysis of allele frequencies furthermore identified a metagene with a similar correlation to country IQ (r = .86). The majority of the alleles (seven out of nine) loaded positively on this metagene.

      So far, 1,271 SNPs have been identified as associated with high intelligence. The estimated number of SNPs that determine intelligence is 10,000 to 40,000. The average effect size of the identified SNPs is 0.01%. So, we either have to hope for a lot more SNPs to be added to the number used for these polygenic scores, or there would need to be a way to do the anchoring calculation with a very large N. It might be possible, even with the number of SNPs in use today, if they are sufficiently indicative of change.

      Like

Leave a reply to Ryan Cancel reply