Scientists with the Human Pangenome Reference Consortium have released a new high-quality collection of reference human genome sequences that includes genomes of 47 people, with the goal of increasing that number to 350 by mid-2024.
The original reference human genome sequence is nearly 20 years old and has been regularly updated as technology advances and researchers fix errors and discover more regions of the human genome.
However, it is fundamentally limited in its representation of the diversity of the human species, as it consists of genomes from only about 20 people, and most of the reference sequence is from only one person.
“Everyone has a unique genome, so using a single reference genome sequence for every person can lead to inequities in genomic analyses,” said Dr. Adam Phillippy, a researcher with the National Human Genome Research Institute at the National Institutes of Health.
“For example, predicting a genetic disease might not work as well for someone whose genome is more different from the reference genome.”
The current reference human genome sequence has gaps that reflect missing information, especially in areas that were repetitive and hard to read.
Recent technological advances such as long-read DNA sequencing, which reads longer stretches of the DNA at a time, helped researchers fill in those gaps to create the first complete human genome sequence.
This complete human genome sequence, released in April 2022 as part of the Telomere-to-Telomere (T2T) Consortium, is incorporated into the current pangenome reference.
Using advanced computational techniques to align the various genome sequences, the Human Pangenome Reference Consortium constructed a new human pangenome reference with each assembly in the pangenome covering more than 99% of the expected sequence with more than 99% accuracy.
It also builds upon the previous reference genome sequence, adding over 100 million new bases.
While the previous reference genome sequence was…
Read the full article here