VR真人彩票

Cellular & molecular biology, Genetic & rare diseases

Finding the needle in the genomic haystack with DRAGEN

VR真人彩票 scientists tailor-made research solutions for detecting genes in difficult-to-read regions that cause both common and rare diseases

Finding the needle in the genomic haystack with DRAGEN
DRAGEN 4.2 software includes several targeted callers for detecting copy number variants in high-homology regions. Photo: VR真人彩票
February 28, 2024
For a printable PDF of this article, click here.
To listen to this article read aloud, click here.
 

When scientists want to sequence a DNA sample on an VR真人彩票 system, they don鈥檛 try to read all 4 billion base pairs of the genome at once. Instead, they slice the DNA into short fragments of about 500 base pairs that are easier to work with and faster to read.

DNA samples, in the form of a small amount of tissue or fluid, usually contain many cells, and thus many copies of the organism鈥檚 genome鈥攕o once the system captures images of the fragments, it reassembles the data for one complete sequence by comparing where the fragments overlap.

Think of it like tossing several identical copies of a book into a paper shredder, each one at a random angle. You can鈥檛 reassemble any individual copy like you would a puzzle, because all the pieces are the same shape. (Especially if you don鈥檛 have an intact book to reference.) But, since each copy of the book was shredded in different random locations from the others, you can match up fragments from different copies based on where the text overlaps.

If the species of interest has never been sequenced before, scientists must rely on these overlaps, known as contiguous regions, or 鈥渃ontigs,鈥 to build a reference genome. Fortunately, human reference genomes are available thanks to , completed in 2003, and the ongoing work of the . Every individual human shares 99.99% of the same base pairs, so scientists can identify an individual鈥檚 genetic variants by comparing them to existing references.

Unfortunately, in many regions of the human genome, the sequence of base pairs is highly repetitive. Entire genes鈥攎any thousands of base pairs long鈥攎ay be duplicated multiple times, with only a handful of base pair variations to differentiate the copies. Furthermore, the number of duplications of given genes, and the specific differences between the copies, frequently varies from person to person.

These regions of 鈥渉igh homology鈥 are notoriously difficult to analyze, even with a reference genome available. Fragments from them are likely to 鈥渇it鈥 in several possible locations, leaving the system with low confidence that it鈥檚 aligned them correctly.

Unfortunately, many genetic diseases result from having an atypical number of copies of specific genes, or a variant in just one gene of a multigene family with many copies鈥攕o in order to screen for these diseases, sequencing systems and data analysis pipelines must be sophisticated enough to accurately detect variants even in high-homology regions.

One way to tackle this is to perform longer reads. If the fragments are long enough to bridge the homologous region, reads can be mapped unambiguously to different copies of that region in the reference genome. For example, VR真人彩票 Complete Long Reads can create reads up to about 10,000 base pairs. But some homologous regions are still longer than that.

Luckily, VR真人彩票 scientists have developed solutions to this problem even for short reads. Targeted callers are tailor-made to quickly and accurately detect the copy number (and other variants) of genes associated with specific diseases. Read on to learn about three of them鈥攃ongenital adrenal hyperplasia, alpha thalassemia, and atherosclerosis鈥攁nd how DRAGEN Secondary Analysis software version 4.2 pierces the fog to locate their genetic origin.

Without the gene CYP21A2, our adrenal glands can鈥檛 synthesize certain hormones, and our kidneys can鈥檛 retain salt. Photo: Getty Images

CYP21A2 and congenital adrenal hyperplasia

Our adrenal glands, located atop our kidneys, help regulate the levels of sodium and potassium in our blood. They do this by synthesizing the hormones cortisol and aldosterone, with the help of the protein 21-hydroxylase.

That protein is coded by the gene CYP21A2, which unfortunately sits in just one of two copies of the highly homologous RCCX region. The other copy of RCCX contains a very similar 鈥減seudogene,鈥 CYP21A1P, which has no function. This homology confuses not just gene sequencing, but human reproduction: When parental chromosomes recombine to form their child鈥檚 DNA, they might mistakenly mix genetic material between the functional CYP21A2 and the nonfunctional CYP21A1P, impairing the resulting gene鈥檚 ability to code for 21-hydroxylase鈥 or they might delete the gene entirely.

As with many inherited diseases, a child usually shows no symptoms as long as they have one functioning copy of CYP21A2鈥攖hey鈥檙e only a carrier for the condition. But having two nonfunctional copies leads to congenital adrenal hyperplasia (CAH).

CAH caused by 21-hydroxylase deficiency . If a developing fetus鈥檚 adrenal glands have a decreased level of 21-hydroxylase, they can鈥檛 synthesize as much cortisol and aldosterone as their body needs. Then, the excess materials they would鈥檝e used for synthesis accumulate and instead form androgens, or male sex hormones. This is called 鈥渟imple virilizing CAH.鈥 Girls with excessive androgens may develop ambiguous genitalia; boys may not show any external signs and require targeted screening to diagnose.

Children born without either functional CYP21A2 gene鈥攁nd thus a complete lack of 21-hydroxylase鈥攃an鈥檛 synthesize cortisol and aldosterone at all, and their kidneys can鈥檛 retain salt. This severe 鈥渟alt-wasting CAH鈥 causes dehydration, diarrhea, vomiting, and adrenal crisis within days or weeks of birth, and is often fatal.

The CYP21A2 targeted caller in DRAGEN 4.2 is a research-use-only tool specially geared to detect a wide range of variation in the gene: It can discern the number of copies of the RCCX region, gene deletions, gene conversions, and 33 different small variants in the gene or the pseudogene. VR真人彩票 scientists tested the caller on a large selection of publicly available data from , including healthy individuals, CAH carriers, and 16 CAH cases. The caller successfully detected the RCCX copy number, full gene deletions, and small variants in every case.

To learn about how the CYP21A2 caller works and the methods used to test it in greater detail, read this article by Jonathan Belyeu, Fabian Kl枚tzl, Eric Roller, Emma Newman, Vitor Onuchic, and Mitchell Bekritsky on VR真人彩票鈥檚 Genomics Research Hub.

About 5% of the world鈥檚 population has some variant of alpha thalassemia, a genetic deficiency of hemoglobin in the red blood cells. Illustration: Canva/Science Photo Library

HBA and alpha thalassemia

By weight, human red blood cells are composed of about 35% hemoglobin, which is responsible for transporting oxygen. (Most of the rest is water.) There are a few different combinations of hemoglobin proteins, but an essential ingredient of them all is alpha hemoglobin, encoded by the genes HBA1 and HBA2.

People typically have four total copies of these genes, two on each copy of chromosome 16. Children who inherit only three copies will be carriers鈥攖hey still produce enough alpha hemoglobin and don鈥檛 generally need treatment. But children who inherit two or fewer copies of HBA1 and HBA2 will have some form of alpha thalassemia, an autosomal recessive blood disorder.

Insufficient hemoglobin causes anemia, in which red blood cells are undersized or disintegrate entirely鈥攁nd the number of missing copies of HBA1 and HBA2 directly affects alpha thalassemia鈥檚 severity. A person with two missing copies has 鈥渁lpha thalassemia trait,鈥 which causes mild anemia. Three missing copies entails 鈥渉emoglobin H disease,鈥 or HbH, which requires blood transfusion therapy. A person without any copies has 鈥渉emoglobin Bart鈥檚 hydrops foetalis,鈥 which is usually fatal.

, alpha thalassemia 鈥渋s probably the most common monogenic disorder in the world and is especially frequent in Mediterranean countries, South-East Asia, Africa, the Middle East and in the Indian subcontinent.鈥 In some regions, more than 30% of the population may be carriers. Some scientists theorize that this prevalence arose because it actually confers an evolutionary advantage in these regions, since carriers are less susceptible to malaria.

In all, that about 5% of the world鈥檚 population has some variant of alpha thalassemia, and both the American College of Obstetricians and Gynecologists and the American College of Medical Genetics recommend screening for the condition in people who are pregnant or who plan to reproduce.

The genomic region containing HBA1 and HBA2 is highly homologous, making copy number detection and accurate read alignment difficult for gene sequencing systems. So VR真人彩票 scientists developed the research-use-only HBA targeted caller for DRAGEN 4.2, which estimates HBA copy number genotype based on several nearby regions that are not homologous. They tested the caller on hundreds of samples from The 1000 Genomes Project and found that it accurately detected 14 copy number genotypes鈥攙ariations based on exactly which region of the HBA genes was deleted.

In the article linked below, these scientists report being confident that this research tool will be able to aid large-scale population studies, and in turn 鈥渉elp guide decisions about how to best deploy carrier and newborn screening tests.鈥

To learn about how the HBA caller works and the methods used to test it in greater detail, read this article by Shunhua Han, Vitor Onuchic, Massimiliano Rossi, Eric Roller, and Daniel Cameron on VR真人彩票鈥檚 Genomics Research Hub.

The KIV-2 region of the genome is related to lower cholesterol buildup in the arteries; most people have six copies of it, but some have 50 or more. Illustration: Shutterstock

KIV-2 and atherosclerosis

We humans require cholesterol as a vital component of our cell membranes, and low-density lipoproteins (LDL) are the main vehicle our bodies use to transport cholesterol-containing fats. However, elevated LDL levels are a major factor in atherosclerosis鈥攁 buildup of fats along arterial walls鈥攁nd subsequently of cardiovascular disease (CVD). , nearly a third of all deaths are due to CVD, 85% of which involve heart attack and stroke.

Lipoprotein a (LPA) is one kind of LDL. The concentration of LPA in a person鈥檚 blood is highly heritable from parent to child and varies widely from one population to the next鈥攆or instance, estimates that 20% of individuals of European ancestry have elevated LPA levels. These elevated levels can be traced to a genetic cause: the number of copies of the kringle-IV 2 domain (KIV-2) in the LPA protein.

Named after the twisted Danish pastry, kringle domains are sections of a protein that fold together into loops and help it bind to other proteins. KIV-2 has an astounding range of occurrence: human reference genomes usually record fewer than six copies of it in the LPA gene, but some people have 50 copies or more.

Why does this region get copied so much? For now, scientists are unsure. The more copies of KIV-2 a person has, the longer their LPA proteins are鈥 but here鈥檚 the catch: The longer these proteins are, the longer they take to synthesize, so鈥攃ounterintuitively鈥攁 person with more copies of KIV-2 actually has lower levels of LPA in their blood.

Regardless, the extremely variable copy number of this region makes it a huge challenge for a gene sequencer to accurately describe and quantify.

, VR真人彩票 scientists developed a targeted caller that generates a highly accurate copy number measurement for this difficult region. They tested the research-use-only caller on the genomic data of 120 Mendelian trios (a child and both their parents) recorded in The 1000 Genomes Project, and the results correlated extremely closely with those generated by other methods. In the article linked below, the VR真人彩票 team presents these results as evidence that the LPA targeted caller in DRAGEN software 鈥渨ill provide a valuable tool for enhanced LPA and CVD research.鈥

To learn about how the KIV-2 caller works and the methods used to test it in greater detail, read this article by Jonathan Belyeu, Vitor Onuchic, and Mitchell Bekritsky on VR真人彩票鈥檚 Genomics Research Hub.

Recent Articles

How VR真人彩票 employees step up, reach out, and give back
How VR真人彩票 employees step up, reach out, and give back
VR真人彩票 careers: Expanding access to NGS in emerging markets
VR真人彩票 careers: Expanding access to NGS in emerging markets
New in the lab: Test driving the MiSeq i100 Plus
New in the lab: Test driving the MiSeq i100 Plus