The effect of bias in genomic studies
The Human Genome Project accelerated genomics research. Is further progress in the field of genomics being halted by bias?
The field of genomics appears to be shrouded by bias at different levels. Racial bias exists in genomic databases, making it more difficult and expensive to diagnose genetic diseases and employ strategies for treatment in African–American patients.
Historical bias also exists, with researchers less likely to study genes that are potentially implicated in diseases if those genes are less well-researched.
Historical bias
Genes that have been shown to have potentially significant implications in human disease are being ignored due to historical bias. Previous studies have reported that research is focused around only 2000 coding genes, out of a possible 20,000 genes in the human genome. Now a study, published recently in PLOS Biology, addresses the reasoning behind why biomedical researchers continually study the same 10% of human genes.
The team discovered that historical bias played a huge role in this, with old policies, funding and career paths being the main driving forces behind this.
“We discovered that current research on human genes does not reflect the medical importance of the gene,” commented Thomas Stoeger (Northwestern University; IL, USA). “Many genes with a very strong relevance to human disease are still not studied. Instead, social forces and funding mechanisms reinforce a focus of present-day science on past research topics.”
The researchers were able to analyze approximately 15,000 genes; applying a systems approach to the data to uncover underlying patterns. The result is not only an explanation of why certain genes are not studied, but also an explanation of the extent to which a gene has been studied.
They discovered that post-docs and PhD students that focus on less well-categorized genes have a 50% less chance of becoming an independent researcher. As well as this, policies that were implemented in order to further innovative research have actually just resulted in more research on well-categorized genes – the small group of genes that were the focus of much research prior to the Human Genome Project.
Since the completion of the Human Genome Project in 2003, there have been many novel technologies developed to study genes. However, studies on less than 10% of genes comprise more than 90% of research papers and approximately 30% of genes have not been studied at all. One of the key goals of the Human Genome Project was to expand study of the human genome beyond the small group of genes scientists at the time were continuously studying. Given these revelations, it seems as though the project has somewhat failed.
“Everything was supposed to change with the Human Genome Project, but everything stayed the same,” said Luis Amaral (Northwestern University). “Scientists keep going to the same place, striding the exact same genes. Should we be focusing all of our attention on this small group of genes?”
“The bias to study the exact same human genes is very high,” continued Amaral. “The entire system is fighting the very purpose of the agencies and scientific knowledge which is to broaden the set of things we study and understand. We need to make a concerted effort to incentivize the study of other genes important to human health.”
The Northwestern team will now utilize their research to build a public database identifying genes that have been less well studied but could be importantly implicated in human disease.
Racial bias
It has previously been shown that two of the top genetic databases contain considerably more genetic data on those of European descent than those of African descent. When the researchers compared their own dataset of 642 whole-genome sequences from those of African ancestry to current genomic databases, they discovered that there was a clear preference for European genetic variants.
“The ability to accurately report whether a genetic variant is responsible for a given disease or phenotypic trait depends in part on the confidence in labelling a variant as pathogenic,” explained the authors of the paper, published in Nature Communications. “Such determination can often be more difficult in persons of predominantly non-European ancestry, as there is less known about the pathogenicity of variants that are absent from or less frequent in European populations.”
In the study, the researchers distinguished pathogenic annotated variants, which have been identified as disease-causing on online databases, from non-annotated variants (NAVs), which are not identified as disease-causing.
“While we cannot be sure which of these variants are truly disease-causing (actual ‘needles’ rather than haystack members) without additional functional or association-based evidence, we believe that discrepancies between true pathogenicity and annotated pathogenicity are a major source of the biases we report,” the authors commented. “A likely contributor to this incongruity is that databases are missing population-specific pathogenicity information, and with regard to the results we report here, African-specific pathogenicity data.”
The researchers discovered that NAVs have the highest degree of positive correlation with those of African ancestry. Therefore, genetic variants that are disease-causing for this population (the needles) are likely to be in the NAV category (the haystack), and as such, will be harder to find. This is due to the sheer volume of NAVs, which will be even greater for those of African descent, who have a larger degree of genetic variation.
The databases must now be expanded to include a wider range of ancestries. This will dramatically change clinical genetics and diagnoses for African–American patients. Currently, the task of analyzing these patients’ genomes is a more difficult and expensive one, compared with those of European descent.
Although these biases may seem like tough fixes to make, there is hope that awareness will set in action the steps that need to be implemented to overcome them and move research forward.