Become a member of BioTechniques (it's free!) and receive the latest news in the life sciences and member-exclusives.

First draft of the human pangenome reference published

Written by Aisha Al-Janabi (Assistant Editor)

The Human Pangenome Reference Consortium has announced the first draft of a human ‘pangenome’, which incorporates DNA data from 47 individuals.

The Human Pangenome Reference Consortium has used genomic data from 47 individuals to generate the first human pangenome, capturing more diversity than the previous human genome reference. In doing so, they added another 119 million base pairs to the existing library known to make up the human genome.

The Human Genome Project was first launched in October 1990, with goals that included mapping the human genome and determining the sequence of 3.2 billion base pairs. On April 14, 2003, the completion of the Human Genome Project was announced, 50 years after the discovery of the double helix structure of DNA. Most of the human genome sequence from this project was based on one individual, with 70% from a single donor.*

The genome sequence announced in 2003 only accounted for 92% of the human genome, and it wasn’t until March 31, 2022 that the Telomere-to-Telomere Consortium announced that the first truly complete, gapless human genome sequence had been acquired. This was possible due to the emergence of technologies such as long-read DNA sequencing.


The complete human genome sequence – a tale of a technical arms race

Finally, new technology has allowed researchers to complete the human genome sequence – apart from for the pesky Y chromosome.


However, relying on a reference genome that is mostly based on a single person is not representative of the global genomic landscape. This introduces reference bias, impacts the accuracy of genetic analysis, restricts the ability to discover variants and impedes understanding of how genomic variants influence disease.

“Everyone has a unique genome, so using a single reference genome sequence for every person can lead to inequities in genome analysis,” explained Adam Phillippy (National Human Genome Research Institute; MD, USA), a co-author of the study. “For example, predicting a genetic disease might not work as well for someone whose genome is more different from the reference genome.”

To address this, the Human Pangenome Reference Consortium was launched in 2019 to assemble genomes from a more diverse population. Now, in May 2023, the Consortium announced the first draft of the human pangenome, which incorporates complete DNA data of 47 individuals from every continent except Antarctica. Unlike the existing linear reference, the pangenome displays multiple alternatives of the human genome sequence simultaneously.

The pangenome not only introduces more genetic diversity into the reference map but has also added 119 million base pairs to the existing library. This new information improves the ability to analyze human genomes for drug discovery, diagnosis and genome-guided precision medicine.

Some are concerned that data from the Human Pangenome Reference Project could be commodified, as has happened with previous projects such as the Human Genome Diversity Project, without subsidizing or benefiting the individuals and communities without which these databases could not exist. Additionally, in an interview with BioTechniques in 2022, Claudia Gonzaga-Jauregui (Principle Investigator of the Mendelian Genomics and Precision Health Laboratory; Juriquilla, Mexico) spoke about underrepresentation in genomic data and the responsibility that projects working with underrepresented and marginalized groups have to make sure participants receive benefits following their participation, such as a return of results and genetic counseling, and ensure they build community relationships.

To address these concerns, the Human Pangenome Reference Consortium does include an ethics group, dedicated to guiding informed consent and exploring the regulatory issues should this data be utilized in a clinical setting.

The Human Pangenome Reference Consortium hopes to add complete genetic data from 350 individuals to its reference database by mid-2024.

“The human pangenome reference will enable us to represent tens of thousands of novel genomic variants in regions of the genome that were previously inaccessible,” commented Wen-Wei Liao (Yale University; CT, USA), co-first author of the paper. “With a pangenome reference, we can accelerate clinical research by improving our understanding of the link between genes and disease traits.”