The "peer reviewed" complaint is rich considering he regularly posts findings (that often contradict the published literature) to his personal twitter (e.g. failing to replicate the Withspoon 2007 finding of same-race people ~always being more genetically similar than different-race people) without bothering to put them in papers.
No one ever complains about people discussing preprints, unless the results are posted by the wrong people. So we already know this is a special pleading issue.
"Surprisingly, only one SNP reached genome-wide significance for educational attainment in the Tan et al. study"..."but we can still leverage the Ea3/Ea4 SNP set and reweight them using the direct-effect betas from the family GWAS"
Why stop with the SNPs marked as significant in the between-family GWAS? Why not just use the complete set of SNPs that people vary on? It's not just adding significant SNPs which adds signal to a PGS, it's also reducing the error with which differences in effects between significant SNPs are estimated, which happens with every increase in statistical power. Likewise, in any set of 1000 SNPs which contains no individually-significant SNPs, it should still be possible to say that at least some have trait effects even if you can't say which. With EA4 for example, in the Add health validation sample, 9.1% of variance can be explained using only significant hits, but with a PGS using variants which aren't individually-significant, R^2 rises to 15.8% (see p.440):
If the situation with FGWAS is similar, applying the hybrid approach only to significant EA4 hits would capture only 9.1/15.8 = 57.6% of reliable signal in the FGWAS PGI. It's this principle which even makes the hybrid approach valid to begin with. I wouldn't be surprised if there were the power to do proper polygenic selection analysis, dividing SNPs into MAF+LD bins and then generating control PGIs by redistributing the effect sizes among SNPs sharing a common bin.
When you want to rank groups, you want variants with maximum evolutionary signal, not maximum within group predictive signal. These goals lead to opposite trade-offs. If you use all snps from a GWAS, you will be using lots of tag variants with very little signal and potentially lots of bias, on top of the many variants without any true signal (non-sig). The rankings of groups will be mostly random when you include many variants without much signal. For this reason, what you want to do is prune variants until a smaller set that have convergent validity of selection, that is, show strong positive manifold. This means they are all picking up on the same directional selection. Correlations between variants far away from each other that aren't in recombinatorial LD (aren't close) is evidence of directional selection in itself, but doesn't tell us the direction. This is thus the first crucial test. As such, there is little point in comparing PGS for groups if the variants don't show a positive manifold to begin with. Any PGS differences that remain could be ones from genetic drift, but are more likely to be noise.
Using the same filtering criteria across replications is also crucial to ensure reproducibility, otherwise one will just mess around with p values until they give the result they want, like Gusev is doing
You are conflating within and between population predictive validity. I explained in my post these are fundamentally different and there are many reasons why it is better to use only significant SNPs, both theoretical and empirical
Also, aren't the Tan et al weights not exclusively based on sibling GWAS? Isn't it also based on SNIPAR where child genotypes are controlled for patent genotypes?
This post is framed as a "response to Gusev" but then proceeds to regularly quote Scott Alexander and ascribe his statements to me. This is par for the course for the broader sloppiness and incoherence of the work of Piffer and colleagues. I will reiterate the points I made previously: cross-population PGS comparisons have no theoretical basis whatsoever and are confounded by multiple simultaneous processes so as to make the results completely uninterpretable.
I made two core claims in my post (https://theinfinitesimal.substack.com/p/how-population-stratification-led). First, population stratification has been shown to be substantial *especially* on traits like EA/IQ and will to false estimates that mirror environmental population differences. The same methods have estimated much weaker stratification on height in modern biobanks, making it a poor comparison trait. This point is completely ignored here (in fact the post erroneously claims that population stratification is not an issue). Second, family GWAS analyses of IQ produce completely different group rankings than estimates using population GWAS data, further demonstrating the problems with population based polygenic scores. This point is also completely ignored here. Thus the post is not a "response to Gusev" but a rehash of prior justifications the authors have made.
The rest of the post is a dump of nonsensical results and just so stories: EA polygenic scores trained in East Asian populations inexplicably performing more poorly in East Asian target populations than those trained in European populations; bizarrely stronger associations between "regional IQ" and SCZ PGS than EA/IQ PGS; nonsensical group results that constantly shift with each analysis (East Asians on top; then Europeans on top; then Amish on top; then East Asians below Middle Easterners; and on and on); the use of a hybrid approach explicitly criticized in Zaidi & Mathieson; undefined and nonsensical adjustments for "LD Decay" that are never confirmed to work; "replication" that reuses the same target data and overlapping GWAS data and is conducted by Piffer himself or his colleagues. The goal appears to be simply to generate such a large number of figures and AI generated descriptions as to exhaust the reader.
This team of authors have apparently put a decade into this ideological project and yet the output -- as well as this "debunking" -- is a complete embarrassment to the field. Anyone promulgating this low quality nonsense should be ashamed.
What's your problem? I clearly stated that I was debunking the claims "particularly those found in Gusev’s blog and his discussion with Astral Codex Ten". Where exactly did I misquote you? I quoted your answers to ACT. Other than that, your comment looks like a rant and unprofessional. You talk about nonsensical and undefined adjustments because you didn't bother to read the paper where I explained how they are done. You use the word "nonsensical" several times because you are a lazy scientist who cannot engage in serious debate.
Just leaving this for other readers who may be confused: I've now pointed out a multitude of errors or nonsensical findings in Piffer's analysis and his response is "What's your problem?"
My question was related to your accusation of having misquoted you. I replied to the answers you gave to Astral Codex Ten in your interview. Which statements by Scott Alexander did I incorrectly ascribe to you?!
It appears there may be some fundamental misunderstandings regarding my methodology. Your replication attempt does not align with standard practices, particularly in your choice of statistical thresholds, which are not analytically sound. In population genetics, the signal strength of these variants is well-established to be subtle at the population level—a distinction that seems to be overlooked in your analysis. You seem to be conflating population level and individual level predictive validity. My approach maintains consistent SNP filtering criteria across studies precisely to ensure methodological rigor and reproducibility. Arbitrarily adjusting these parameters, as your analysis appears to do, risks introducing bias and undermines the validity of any conclusions drawn.
Thank you for confirming that you don't know what you're talking about. My replication analysis used the same exact methods and reference data as Tan et al. and default thresholds.
Buddy you must be either lying or kidding. I checked the Tan at al. sumstats for both EA and cognitive performance and there is literally only 1, I mean ONE SNP whose p value for the direct effect is <5e-8. So unless you have access to unpublished sumstats, this is just BS!!
I think this discussion is well past the point of being a waste of time.
Here's the description of how polygenic scores were constructed in Tan et al: "We compute PGIs separately from DGE and population effect estimates for each phenotype. The PGI weights were computed using PRS-CS54. We use the EUR LD reference panel provided in PRS-CS, which was constructed using UK Biobank data and comprises 1,117,425 SNPs from HapMap3."
It's become very clear that you have no clue what you're doing and are stringing together technical terms and hoping no one notices or bothers to engage.
No. It's not good to insult people, but it's very appropriate to insult shoddy research. Piffer's analyses are genuinely meritless and embarrassing and there's no reason to mince words about that. I pointed out numerous specific analytical errors too, so you should be capable of decoupling your knee-jerk emotional response and focusing on the facts.
The additive model of GWAS can be "stupid" in a certain sense, because it ignores epistasis (genetic interactions) and other non-additive effects. SNPs can also be on regulatory elements, not just on protein-coding genes, which are context-dependent – for example, an SNP has a positive effect next to one allele, but negative next to another – meaning the overall effect of SNPs is not additive, but combinatorial. This is similar to the combination of poker cards – so you don't have to pay attention to the values of the cards, but you have to find the rules of the game!There are studies that show that epistasis contributes to the missing heritability in polygenic traits, like IQ, and non-additive models (e.g., GenoBoost) improve PGS accuracy.In reality, the majority of the variance (~90%) appears to be additive in large samples, and detecting epistasis is difficult because a huge number of interactions need to be tested. Moreover, to figure out the exact rules of the game, full-genome projects per person would be needed! In the not too distant future, if every newborn's genome project becomes mandatory, it will be provided from the data side. For recognizing the rules of the game, deep learning (AI)-based PGS studies are needed, which also model non-additive effects. In my opinion, this computational capacity does not yet exist…
I hope it's true—it's like a cheat code. If we can crack the genetic code, we could address societal issues like crime, poverty, and even physical attractiveness. I don’t understand why some people think we can’t or shouldn’t fix these just because they’re tied to genetics. I’d argue the opposite: if it’s genetic, it’s a solvable problem. Personally, I’m more interested in enhancing looks than boosting IQ. Call me shallow, but I’d prefer a society with an average IQ of 85 where people look attractive over one with an average IQ of 115 where the average person looks unattractive. I guess I’m aiming for a world of superhumans.
Perhaps so, but you just did it again to me. I don't think you are coming across as you think you are. I imagine it is your intention to be precise and sternly-worded. That is not how it reads. Is this a thing with Ivy-League researchers? A few guys I worked with at Geisel were the same.
Well, that was a long slog for someone who is not a professional and you lost me more than once, but I thank you for it. I am over from the ACX discussion. I blog as Assistant Village Idiot and in that role I am always looking for tells to get a psychiatric social worker through material over his head. The first is to notice who is fighting fair. Here is also one of remarkable persistence from my personal experience in acute psychiatric emergencies. When someone believes they know what your bad motives are, nothing you say will ever convince them you are factually correct. Even fairly decent people will be unable to process it somehow. They are sure that there must be some mistake. Because you came up with a bad-person answer, there must be a bad person motive. I noticed this immediately in um, one of your critics. While this is weaker in the rationalist community, it is still powerful. Two things help this: you will have to find something you agree with them on, especially if they are beleaguered by fools on the topic and would welcome the occasional defender. Second, remember Ben Franklin's advice about getting people to do you a favor. Just being thorough and polite will not sway them. And to be fair, it usually doesn't sway any of us.
Similarly, when anyone challenges your credentials, no credentials will be good enough for them. "Well, you might be the dean of an Ivy-League med school, but I'll bet you've never studied psychonutrition. You whole school doesn't have a single course about it."
The polygenic score for African height is super interesting. We really need additional data on West Africans to understand the propensity for psychosis. Studies on brain size are also interesting
The provincial Chinese validation is splendid but it would also be good to explicitly test if the higher PGS scores of certain regions are mediated by genetic similarity to Euro ancestry. (Maybe the paper does this and i just haven't read it)
The western provinces with Euro (Xinjiang) or Mongolian admixture do have particularly higher height PGS, whereas IQ peaks in the East. Unfortunately I did not have access to individual level data so I couldn't test for admixture
maybe raw geographic distance from Europe could make a rough control?
Also you computed avg PGI scores somehow right (based on avg allele frequencies maybe)? I’d think it would be possible to compute avg differences in allele frequency if it’s possible to compute avg scores on different PGIs.
i'd reckon that in 1000 genomes or whatever it'd be possible to train some kind of function to use avg absolute allele frequency differences to rank order the genetic similarities of pairs of populations in a way that closely approximates the rank ordering of pairwise FST distances. I can understand tho, if even possible, if it'd be too much trouble to actually bother with.
On the other hand it would be cool if we could formally show that polygenic score differences are orders of magnitude bigger than ancestry differences within China
yes it is possible. I could test it on 1KG and apply it to the Chinese provinces but I think we wouldn't gain much insight as the ancestry differences between Chinese provinces are very small compared to the difference between Chinese and Europeans. Like we are talking about Fst<0.03
This doesn't make sense, it I believe the peak of E Asian IQ is in the very far east( Korea, Japan, Eastern China), where there is virtually no Euro ancestry. It seems clear that E Asian seaboard is its own independent center of high cognitive ability, with its corresponding distinct cognitive profile( higher spatial, moderate to high verbal).
The "peer reviewed" complaint is rich considering he regularly posts findings (that often contradict the published literature) to his personal twitter (e.g. failing to replicate the Withspoon 2007 finding of same-race people ~always being more genetically similar than different-race people) without bothering to put them in papers.
No one ever complains about people discussing preprints, unless the results are posted by the wrong people. So we already know this is a special pleading issue.
"Surprisingly, only one SNP reached genome-wide significance for educational attainment in the Tan et al. study"..."but we can still leverage the Ea3/Ea4 SNP set and reweight them using the direct-effect betas from the family GWAS"
Why stop with the SNPs marked as significant in the between-family GWAS? Why not just use the complete set of SNPs that people vary on? It's not just adding significant SNPs which adds signal to a PGS, it's also reducing the error with which differences in effects between significant SNPs are estimated, which happens with every increase in statistical power. Likewise, in any set of 1000 SNPs which contains no individually-significant SNPs, it should still be possible to say that at least some have trait effects even if you can't say which. With EA4 for example, in the Add health validation sample, 9.1% of variance can be explained using only significant hits, but with a PGS using variants which aren't individually-significant, R^2 rises to 15.8% (see p.440):
https://not-equal.org/content/pdf/misc/10.1038.s41588-022-01016-z.pdf
If the situation with FGWAS is similar, applying the hybrid approach only to significant EA4 hits would capture only 9.1/15.8 = 57.6% of reliable signal in the FGWAS PGI. It's this principle which even makes the hybrid approach valid to begin with. I wouldn't be surprised if there were the power to do proper polygenic selection analysis, dividing SNPs into MAF+LD bins and then generating control PGIs by redistributing the effect sizes among SNPs sharing a common bin.
When you want to rank groups, you want variants with maximum evolutionary signal, not maximum within group predictive signal. These goals lead to opposite trade-offs. If you use all snps from a GWAS, you will be using lots of tag variants with very little signal and potentially lots of bias, on top of the many variants without any true signal (non-sig). The rankings of groups will be mostly random when you include many variants without much signal. For this reason, what you want to do is prune variants until a smaller set that have convergent validity of selection, that is, show strong positive manifold. This means they are all picking up on the same directional selection. Correlations between variants far away from each other that aren't in recombinatorial LD (aren't close) is evidence of directional selection in itself, but doesn't tell us the direction. This is thus the first crucial test. As such, there is little point in comparing PGS for groups if the variants don't show a positive manifold to begin with. Any PGS differences that remain could be ones from genetic drift, but are more likely to be noise.
Using the same filtering criteria across replications is also crucial to ensure reproducibility, otherwise one will just mess around with p values until they give the result they want, like Gusev is doing
You are conflating within and between population predictive validity. I explained in my post these are fundamentally different and there are many reasons why it is better to use only significant SNPs, both theoretical and empirical
Also, aren't the Tan et al weights not exclusively based on sibling GWAS? Isn't it also based on SNIPAR where child genotypes are controlled for patent genotypes?
This post is framed as a "response to Gusev" but then proceeds to regularly quote Scott Alexander and ascribe his statements to me. This is par for the course for the broader sloppiness and incoherence of the work of Piffer and colleagues. I will reiterate the points I made previously: cross-population PGS comparisons have no theoretical basis whatsoever and are confounded by multiple simultaneous processes so as to make the results completely uninterpretable.
I made two core claims in my post (https://theinfinitesimal.substack.com/p/how-population-stratification-led). First, population stratification has been shown to be substantial *especially* on traits like EA/IQ and will to false estimates that mirror environmental population differences. The same methods have estimated much weaker stratification on height in modern biobanks, making it a poor comparison trait. This point is completely ignored here (in fact the post erroneously claims that population stratification is not an issue). Second, family GWAS analyses of IQ produce completely different group rankings than estimates using population GWAS data, further demonstrating the problems with population based polygenic scores. This point is also completely ignored here. Thus the post is not a "response to Gusev" but a rehash of prior justifications the authors have made.
The rest of the post is a dump of nonsensical results and just so stories: EA polygenic scores trained in East Asian populations inexplicably performing more poorly in East Asian target populations than those trained in European populations; bizarrely stronger associations between "regional IQ" and SCZ PGS than EA/IQ PGS; nonsensical group results that constantly shift with each analysis (East Asians on top; then Europeans on top; then Amish on top; then East Asians below Middle Easterners; and on and on); the use of a hybrid approach explicitly criticized in Zaidi & Mathieson; undefined and nonsensical adjustments for "LD Decay" that are never confirmed to work; "replication" that reuses the same target data and overlapping GWAS data and is conducted by Piffer himself or his colleagues. The goal appears to be simply to generate such a large number of figures and AI generated descriptions as to exhaust the reader.
This team of authors have apparently put a decade into this ideological project and yet the output -- as well as this "debunking" -- is a complete embarrassment to the field. Anyone promulgating this low quality nonsense should be ashamed.
What's your problem? I clearly stated that I was debunking the claims "particularly those found in Gusev’s blog and his discussion with Astral Codex Ten". Where exactly did I misquote you? I quoted your answers to ACT. Other than that, your comment looks like a rant and unprofessional. You talk about nonsensical and undefined adjustments because you didn't bother to read the paper where I explained how they are done. You use the word "nonsensical" several times because you are a lazy scientist who cannot engage in serious debate.
Just leaving this for other readers who may be confused: I've now pointed out a multitude of errors or nonsensical findings in Piffer's analysis and his response is "What's your problem?"
My question was related to your accusation of having misquoted you. I replied to the answers you gave to Astral Codex Ten in your interview. Which statements by Scott Alexander did I incorrectly ascribe to you?!
It appears there may be some fundamental misunderstandings regarding my methodology. Your replication attempt does not align with standard practices, particularly in your choice of statistical thresholds, which are not analytically sound. In population genetics, the signal strength of these variants is well-established to be subtle at the population level—a distinction that seems to be overlooked in your analysis. You seem to be conflating population level and individual level predictive validity. My approach maintains consistent SNP filtering criteria across studies precisely to ensure methodological rigor and reproducibility. Arbitrarily adjusting these parameters, as your analysis appears to do, risks introducing bias and undermines the validity of any conclusions drawn.
Thank you for confirming that you don't know what you're talking about. My replication analysis used the same exact methods and reference data as Tan et al. and default thresholds.
Buddy you must be either lying or kidding. I checked the Tan at al. sumstats for both EA and cognitive performance and there is literally only 1, I mean ONE SNP whose p value for the direct effect is <5e-8. So unless you have access to unpublished sumstats, this is just BS!!
I think this discussion is well past the point of being a waste of time.
Here's the description of how polygenic scores were constructed in Tan et al: "We compute PGIs separately from DGE and population effect estimates for each phenotype. The PGI weights were computed using PRS-CS54. We use the EUR LD reference panel provided in PRS-CS, which was constructed using UK Biobank data and comprises 1,117,425 SNPs from HapMap3."
And here's my code replicating exactly that analysis with PRS-CS using the Tan et al. DGE effect estimates into the 1000 Genomes data: https://github.com/sashagusev/tan2024_cog_hsq/tree/main/pgs
It's become very clear that you have no clue what you're doing and are stringing together technical terms and hoping no one notices or bothers to engage.
Now try to say this again without being insulting. If you can't that should tell you something.
No. It's not good to insult people, but it's very appropriate to insult shoddy research. Piffer's analyses are genuinely meritless and embarrassing and there's no reason to mince words about that. I pointed out numerous specific analytical errors too, so you should be capable of decoupling your knee-jerk emotional response and focusing on the facts.
The additive model of GWAS can be "stupid" in a certain sense, because it ignores epistasis (genetic interactions) and other non-additive effects. SNPs can also be on regulatory elements, not just on protein-coding genes, which are context-dependent – for example, an SNP has a positive effect next to one allele, but negative next to another – meaning the overall effect of SNPs is not additive, but combinatorial. This is similar to the combination of poker cards – so you don't have to pay attention to the values of the cards, but you have to find the rules of the game!There are studies that show that epistasis contributes to the missing heritability in polygenic traits, like IQ, and non-additive models (e.g., GenoBoost) improve PGS accuracy.In reality, the majority of the variance (~90%) appears to be additive in large samples, and detecting epistasis is difficult because a huge number of interactions need to be tested. Moreover, to figure out the exact rules of the game, full-genome projects per person would be needed! In the not too distant future, if every newborn's genome project becomes mandatory, it will be provided from the data side. For recognizing the rules of the game, deep learning (AI)-based PGS studies are needed, which also model non-additive effects. In my opinion, this computational capacity does not yet exist…
I hope it's true—it's like a cheat code. If we can crack the genetic code, we could address societal issues like crime, poverty, and even physical attractiveness. I don’t understand why some people think we can’t or shouldn’t fix these just because they’re tied to genetics. I’d argue the opposite: if it’s genetic, it’s a solvable problem. Personally, I’m more interested in enhancing looks than boosting IQ. Call me shallow, but I’d prefer a society with an average IQ of 85 where people look attractive over one with an average IQ of 115 where the average person looks unattractive. I guess I’m aiming for a world of superhumans.
Perhaps so, but you just did it again to me. I don't think you are coming across as you think you are. I imagine it is your intention to be precise and sternly-worded. That is not how it reads. Is this a thing with Ivy-League researchers? A few guys I worked with at Geisel were the same.
Well, that was a long slog for someone who is not a professional and you lost me more than once, but I thank you for it. I am over from the ACX discussion. I blog as Assistant Village Idiot and in that role I am always looking for tells to get a psychiatric social worker through material over his head. The first is to notice who is fighting fair. Here is also one of remarkable persistence from my personal experience in acute psychiatric emergencies. When someone believes they know what your bad motives are, nothing you say will ever convince them you are factually correct. Even fairly decent people will be unable to process it somehow. They are sure that there must be some mistake. Because you came up with a bad-person answer, there must be a bad person motive. I noticed this immediately in um, one of your critics. While this is weaker in the rationalist community, it is still powerful. Two things help this: you will have to find something you agree with them on, especially if they are beleaguered by fools on the topic and would welcome the occasional defender. Second, remember Ben Franklin's advice about getting people to do you a favor. Just being thorough and polite will not sway them. And to be fair, it usually doesn't sway any of us.
Similarly, when anyone challenges your credentials, no credentials will be good enough for them. "Well, you might be the dean of an Ivy-League med school, but I'll bet you've never studied psychonutrition. You whole school doesn't have a single course about it."
"Some traits (EA3, SCZ) show no LD-related bias at all"
Shouldn't your EA4/height graphs use LDD-adjusted data then? Smaller set of variants can't be that big of a deal if Piffer 2013 worked out.
The polygenic score for African height is super interesting. We really need additional data on West Africans to understand the propensity for psychosis. Studies on brain size are also interesting
The provincial Chinese validation is splendid but it would also be good to explicitly test if the higher PGS scores of certain regions are mediated by genetic similarity to Euro ancestry. (Maybe the paper does this and i just haven't read it)
The western provinces with Euro (Xinjiang) or Mongolian admixture do have particularly higher height PGS, whereas IQ peaks in the East. Unfortunately I did not have access to individual level data so I couldn't test for admixture
maybe raw geographic distance from Europe could make a rough control?
Also you computed avg PGI scores somehow right (based on avg allele frequencies maybe)? I’d think it would be possible to compute avg differences in allele frequency if it’s possible to compute avg scores on different PGIs.
Yes I used allele freqs. Fst requires individual genotypes but Nei distance can indeed be computed using allele frequencies without individual data
i'd reckon that in 1000 genomes or whatever it'd be possible to train some kind of function to use avg absolute allele frequency differences to rank order the genetic similarities of pairs of populations in a way that closely approximates the rank ordering of pairwise FST distances. I can understand tho, if even possible, if it'd be too much trouble to actually bother with.
On the other hand it would be cool if we could formally show that polygenic score differences are orders of magnitude bigger than ancestry differences within China
yes it is possible. I could test it on 1KG and apply it to the Chinese provinces but I think we wouldn't gain much insight as the ancestry differences between Chinese provinces are very small compared to the difference between Chinese and Europeans. Like we are talking about Fst<0.03
This doesn't make sense, it I believe the peak of E Asian IQ is in the very far east( Korea, Japan, Eastern China), where there is virtually no Euro ancestry. It seems clear that E Asian seaboard is its own independent center of high cognitive ability, with its corresponding distinct cognitive profile( higher spatial, moderate to high verbal).