The idea that a trait or disease is “in your genes” is a relatively new scientific concept, and it would not have been discovered without the study of countless human genomes. Nowadays there are over the counter DNA test kits that can provide insights into your ancestry as well as genetic risk for disease—but what are the results based on?
Genome-wide association studies (GWAS) examine genetic variants—changes in the DNA sequence within a gene—in collections of different individuals to determine how they are associated with a given trait. Some variants are common and benign, such as those that contribute to hair color, while others can cause disease or increase disease risk.
Each human genome is chock full of information and genetic variants, so identifying patterns requires collecting and analyzing an enormous amount of data. Unfortunately, the majority of the available data does not apply to everyone.
Approximately 80% of all GWAS studies are run using samples from individuals of European descent, but these individuals make up only 16% of the global population.
European DNA making up majority of the available genetic data is not only out of step with the global population, but it also means scientists know more about the underlying genetics of people of European ancestry than they do about people of other ancestries.
“This can actually trickle down into health disparities, because [with the data available] we can do a better job of figuring out who’s going to get sick if you are of European descent versus if you are not, which is obviously problematic,” says Elizabeth Atkinson, PhD, an Instructor within the Mass General Analytic and Translational Genetics Unit and Broad Institute-affiliated researcher.
Elizabeth Atkinson, PhD
Dr. Atkinson and her team are working to make genetic research more inclusive with the development of a new statistical framework and software package called Tractor, which is designed to include individuals of mixed ancestry. In a recent Nature Genetics research paper, they found their method was able to discover new associations that would have been missed in traditional GWAS while simultaneously boosting the strength of a study’s results.
Why Traditional Studies Use European-descent Samples
One potential explanation could be that initial genome collections were more often done on individuals of European descent, creating a cycle that inevitably made it easier to use European ancestry samples in GWAS studies. “Research builds off of existing knowledge, so if there are a lot of public data sets using European individuals, people would follow along in that path and extend that work,” Atkinson explains.
Another explanation could be attributed to the challenges associated with studying the genetic complexities of individuals with mixed backgrounds and collections of individuals with multiple ancestries.
All humans evolved in Africa, but then slowly dispersed and migrated across the globe. The differences in demographic history “result in different patterns of genetic variation, some of which might be informative for health, but a lot of it is just by chance,” says Atkinson.
Over time, genetic variants naturally drift to frequencies in a given population based on their experiences and environment. Thus, if not properly corrected for, “you could get a significant association with a variant and think it predisposes you to a disease, when it actually is just more common in a certain population of people involved in the study,” Dr. Atkinson explains.
For this reason, most previous studies have limited themselves to only studying a single ancestry group at a time, and in the majority of cohorts the predominant ancestry is European such that other ancestry individuals are excluded.
Tractor: The Impact of Including Admixed Populations in Genetic Studies
Admixed populations are groups of individuals whose genomes contain contributions from multiple continental ancestries. Examples of admixed populations include African American and Latinx individuals.
Multiple ancestries make for a more challenging statistical analysis and could be another reason admixed individuals have historically been excluded from GWAS studies, says Atkinson.
But just because it is a challenge does not mean it is impossible.
“This large fraction of the US and global populace can actually be appropriately studied, either in a cohort alone, or alongside in a mixed collection of people from multiple different ancestry groups. This way, you don’t have to just exclude them.”
In traditional GWAS, a scientist would account for a person’s multiple ancestries by looking at the big picture of a person’s chromosomes and providing general proportions, called global ancestry. For example, the global ancestry of a Latinx person could be 40% European, 40% Indigenous and 20% African.
However, the Tractor method uses local ancestry to look more closely at the individual pieces of chromosomes and which ancestries those pieces come from. This way, variants of the same ancestry are only compared to each other and confounding factors and false associations can be limited.
Enlarge
Enlarge
Latino individuals typically have a mixture of Native American/Indigenous, African and European ancestries. In the above example, scientists on the Tractor team have color-coded sections of each pair of chromosomes to show which sections of the chromosomes can be attributed to a given ancestry. Each line is divided in half to represent one chromosome out of each pair.
Red represents European ancestry, blue represents Native American/Indigenous and green represents African ancestry.
Enlarge
Enlarge
In the above version of the same individual’s chromosomes, the Tractor software was used to highlight only the pieces of the chromosomes that are related to European ancestry. This way, analyses can be run on only the European tracts, for example, without confounding from other ancestries. All ancestries in the admixed person can be utilized and appropriately calibrated against genetic pieces of similar ancestry.
Additionally, instead of saying this individual is 40% European, scientists can see exactly where a person’s European ancestry appears within the genome.
Making these distinctions also allows scientists to generate ancestry-specific effect size estimates, meaning they can more accurately measure the impact of a given variant in people of certain ancestries rather than use a general average.
Effect sizes are critical data points often used to estimate a patient’s risk of developing a disease based on their genes and GWAS summary statistics (polygenic risk score). Because existing effect sizes and GWAS summary statistics are predominantly based on European individuals, they are less likely to be accurate for non-Europeans.
For example, Mass General researcher Alicia Martin, PhD, published a paper in Nature Genetics that found there was a fivefold reduction in accuracy of prediction when applying polygenic risk scores based on European-descent genomes to African-descent populations.
Tractor can help increase the accuracy and inclusion of admixed and non-European descent individuals in gene discovery efforts, potentially leading to improved patient outcomes, but it also simply strengthens study results.
“Not only are we better controlling for population structure and allowing for the study of admixed people, but actually we get a boost in power,” Dr. Atkinson notes. To test this, the team used Tractor with simulated data where an effect was known. When comparing the results of a standard GWAS and Tractor results, researchers found that Tractor often yielded more powerful results for admixed cohorts.
“This gives us the ability to discover variants that are affecting a disease that you would have been unable to find using the prior tools.”
Better Science for All Moving Forward
Dr. Atkinson’s main motivation for this work was to allow and encourage the study of more diverse populations in large scale genomics efforts. “There are notable health disparities across ancestries that could be partially addressed by inclusion of more diverse ancestries in GWAS efforts,” she says.
Her work to develop a well-calibrated method to study diverse populations could help to improve the health of those groups, but also improve health for all.
“I think it’s really important to highlight that, yes, we are doing a better job at figuring out the genetic basis of disease in these diverse and admixed populations, but incorporating this data is also informative for everybody’s health,” Atkinson explains. “This is improving our understanding of the genetic basis of disease for people of all ancestries by leveraging the diversity found in admixed genomes.”
About the Mass General Research Institute
Massachusetts General Hospital is home to the largest hospital-based research program in the United States. Our researchers work side-by-side with physicians to develop innovative new ways to diagnose, treat and prevent disease.
Support our research
Leave a Comment