Witnesses to a Sea Change in Sequencing Capability
Introduction
Sequencing technologies have far surpassed the expectations of Drs. Carlos Bustamante, Stephen Kingsmore, and John Mattick. Had you asked them at the beginning of their careers if one day we could sequence a whole human genome in a day, their responses would have been, respectively: 鈥淐razy talk!鈥, 鈥淎bsolutely not.鈥 and 鈥淣ot in my wildest dreams.鈥
Although the pace of sequencing innovations surprised them, each was quick to adopt next-generation sequencing (NGS), and now population sequencing, to advance their research and translational efforts. As Professor of Genetics and Biomedical Data Science, and founding Director of the Stanford Center for Computational, Evolutionary and Human Genomics, Dr. Bustamante is using population sequencing to understand genetic variances in ancient and ethnic subpopulations. In his new role as President and CEO of the Rady Children鈥檚 Institute for Genomic Medicine, Dr. Kingsmore is using it to develop the evidence base for genomic medicine in children. As the Executive Director of the Garvan Institute of Medical Research, Dr. Mattick is leading efforts to leverage population sequencing data for research and clinical applications.
iCommunity spoke with Drs. Bustamante, Kingsmore, and Mattick about how their teams are using high-throughput whole human genome and population sequencing to advance research and translational studies, the need for databases that merge 鈥渙mics鈥 and phenotypic data, and the challenges of transforming this information into a format that鈥檚 useful in a clinical environment.
听
听
Q: What was sequencing technology like when you first became a scientist?
John Mattick (JM): My first memories of sequencing are peering at bands on autoradiograms. It was the early days of molecular biology. We were cloning and sequencing genes. We thought we were hotshots. We could only read a couple of 100 bases from the gels before the bands were too tight to distinguish. We would assemble a sequence that was 1鈥2 kilobases long and each would be a separate paper. Looking back, it seems so primitive.
Stephen Kingsmore (SK): My sequencing experience began with radioactive p32 labeling, and agarose and polyacrylamide gels. A great sequencing reaction was 150 nucleotides and that took most of the day to do.
Carlos Bustamante (CB): I became a scientist as automated sequencers were being developed, so I performed a little manual sequencing and then a large amount of sequencing on first-generation sequencers. My first experience was as an intern at The Smithsonian where they had just set up the Laboratory of Molecular Systematics. At the time, sequencing a couple of genes from multiple individuals was a huge deal.
Q: How has your approach to sequencing changed as the tools improved?
CB: In the beginning, we treated every piece of data as if it was precious. When Celera began performing early exome sequencing, they performed PCR on 200,000 samples, and sequenced 39 people across 20,000 genes. I thought, 鈥淭his is a data set! We鈥檝e waited a long time for this.鈥 We stopped what we were doing and spent 4鈥5 years studying the 39 exomes, and wrote 8鈥9 papers analyzing the data in different ways. That mindset has been flipped on its head. We鈥檙e now generating data quickly and continually with NGS, and then worrying about what it means.
鈥淭he only way to have accurate variant information is for hundreds of thousands of genomes to be available so that we can assess the frequency of every variant that we see.鈥
Q: When next-generation sequencing (NGS) tools were introduced, how quickly did you incorporate them into your research studies?
CB: NGS quickly became a critical tool for our studies. We were part of the macaque and orangutan genome projects, where we analyzed polymorphism data. We were also one of the original analysis groups for the 1000 Genomes Project, designing the sampling in the Americas, determining the value of 2麓鈥4麓 sequencing, and the bounds of variance frequencies.
SK: We began using NGS systems soon after they were on the market. Those were exciting days. We converted our mail room into an NGS lab. Not much was known about the human genome, so we were discovering new things in every study we performed.
JM: I鈥檝e been an early adopter of new genomics technologies for many years. Along with Craig Venter, I was one of the first customers for the Molecular Dynamics Megabase sequencer. The Garvan Institute was one of the first three institutions to acquire a HiSeq X Ten System.
Q: How did your early sequencing work inform the focus of your current studies?
CB: Early on, we saw polymorphism and variation in genes of interest. In my PhD thesis, I analyzed the largest genome data set at that time, which consisted of 25 Drosophila genes sequenced across multiple individuals and 15 Arabidopsis genes sequenced across multiple plants. We were looking at amino acid differences and the accumulation of good and harmful mutations. From that moment on, I started thinking about creating a large data set of human sequences so that we could analyze it in the same way.
SK: At the National Center for Genome Resources, we used early NGS to sequence transcriptomes and then the genomes of plants and pathogens, and then began sequencing human samples. Several of us realized that the studies we were performing in a research setting would soon begin to impact medical care. After looking around the country, three of us moved to Children鈥檚 Mercy Hospital in Kansas City to establish one of the first pediatric genomics medicine centers and began performing translational research. I鈥檓 now at the Rady Children鈥檚 Institute for Genomic Medicine where we鈥檙e taking that a step further, focusing on implementation of genomic systems medicine at scale in the largest children鈥檚 hospital in California.
JM: High-throughput sequencing had a huge impact on the appreciation of the transcriptional complexity of the human genome. NGS accelerated our ability to dive into the transcriptome, enabling us to explore the extraordinary world of non-protein coding transcripts, which pour out from the genome in precise patterns in different cells and tissues during development. I now think of the human genome as the .ZIP file extraordinaire. The transcriptional complexity of the human genome is at least an order of magnitude more complex than the genome itself, and it can be unzipped in different ways, with different expression and splice patterns of coding and noncoding RNAs in different cells at different times. We would have had no way to explore this world without high-throughput sequencing.
鈥淚n the new world of genomics, every student, post doc, laboratory, and department will need to have the ability to handle and analyze Big Data.鈥
Q: How are you using NGS today?
CB: NGS has opened up new avenues in population genomics. I remember being at a Cold Spring Harbor meeting and realizing that the 1000 Genomes Project should include admixed genomes. People questioned it, but I believed that to analyze and perform transethnic and multiethnic studies we needed to figure out how to make sense of an admixed genome.
One of the reasons we became involved in the Clinical Genome Resource (ClinGen) Consortium was to aggregate clinical genetic testing data and chip away at the variant of uncertain significance (VUS) rate, which is higher in certain ethnic minority groups simply because there haven鈥檛 been as many of these sequences analyzed. NGS made it inexpensive and easy to follow up on these genome-wide association study (GWAS) hits. Each amino acid change we found was a smoking gun. It became clear that we needed to broaden ethnic representation in human DNA studies if we really wanted to develop genomic medicine that benefitted everybody.
SK: We鈥檙e focusing on whole-genome sequencing (WGS) because it鈥檚 the ultimate molecular test. WGS is also faster and we鈥檝e worked with VR真人彩票 to develop a method that allows us to decode and analyze an entire human genome in 26 hours.1 It鈥檚 our plan to offer rapid WGS to every undiagnosed child in our neonatal and pediatric intensive care units (NICU and PICU) by the middle of next year, and to perform clinical research studies to define clinical utility and cost-effectiveness of genomic medicine in pediatric inpatient and outpatient settings.
Q: What are the HiSeq X Systems enabling you to study?
CB: Population sequencing is the culmination of what I鈥檝e always wanted to do鈥攁nalyze many human genomes. We鈥檙e performing large population sequencing studies, using them as the baseline to answer important population genetic questions, and analyzing the results to inform new approaches to clinical medicine. For example, we鈥檙e conducting a preeclampsia study in Peru using both a mixture of large-scale genotyping and sequencing, looking at altitude adaptation as it鈥檚 linked to preeclampsia.
SK: Using the HiSeq X Systems, genomes are much less expensive so we can sequence many more trios. There are 8000 named genetic diseases and we and others feel strongly that NGS is going to transform our ability to identify them. We hope to use the HiSeq X and VR真人彩票 SeqLab infrastructure to gradually develop the evidence base to support that.
JM: The Garvan Institute was one of the first institutes to put genomics at the center of its research endeavor, rather than as an extension of conventional molecular biology. With the extraordinary advances in genome sequencing and concomitant cost reductions, it has become feasible economically to leverage population sequencing and put genomics at the center of both research and the clinic.
It鈥檚 extraordinary how the HiSeq X Systems are enabling translational and research endeavors to merge. We鈥檝e been collaborating with researchers throughout the world. The HiSeq X Ten Systems are working beautifully.
In addition to studying monogenic diseases, we are using population sequencing for major research programs in cancer, diabetes, osteoporosis, immunological diseases, neurodegenerative and neuropsychiatric diseases, and aging. We鈥檙e performing cancer stratification studies as part of the International Cancer Genome Consortium (ICGC), and using NGS to decipher the cancer genome and assess the inherited components of familial cancer risk. We are sequencing people with type 1 diabetes to discover genetic differences between those with the condition who do well through life, and those who suffer severe complications later in life, such as renal failure. In our aging studies, we鈥檙e using population sequencing to study several thousand individuals who have reached old age without any sign of cardiovascular, cancer, cognitive decline, or neurodegenerative disease. We鈥檙e developing a risk depleted cohort that we can use as a control for studies of populations that do suffer such diseases. Other programs underway using the HiSeq X Ten sequencing capacity are to study populations with cardiac, mitochondrial, and Alzheimer鈥檚 diseases.
鈥淥ur biggest challenge is learning how to share population sequencing data.鈥
Q: What are the challenges in sharing population sequencing data?
CB: Our biggest challenge is learning how to share population sequencing data. The NIH and other organizations now mandate that researchers share their data. Unfortunately, this is not true for clinical data. Most hospitals have no real tenet to share data. We also live in a world that is interconnected, and that is making patients uncomfortable in sharing information. That鈥檚 where the efforts of the Global Alliance for Genomics and Health and other entities will be valuable in developing forward-looking consent, privacy procedures, and best practices in data governance and transparency.
SK: Before we can sequence a genome at Rady Children鈥檚 Hospital, parents have to give informed consent. Part of that consent process is an agreement for us to be able to post the genome. We de-identify it so there鈥檚 no information that can tie the genome back to the child or parent, then the information is made available on the National Center for Biotechnology Information (NCBI) database of Genotypes and Phenotypes (dbGaP), a private database. Researchers can obtain access to the data only after applying to NIH and providing a good reason why they need to access the information for their research. It seems to provide a good balance between privacy concerns and the benefit of other researchers being able to study public genomes.
It鈥檚 unfortunate that not all hospitals have a genome sharing informed consent process in place. Clinical researchers need human whole genome sequence information for benchmarking. They want to see how common a variant is in a genome. The only way to have accurate variant information is for hundreds of thousands of genomes to be available so that we can assess the frequency of every variant that we see.
Q: What is the value in integrating WGS, epigenome, transcriptome, and other genomic and phenotypic data to obtain different genomic snapshots?
CB: There鈥檚 significant value in performing all kinds of omics profiling, RNA-Seq, methylome sequencing, etc. We still don鈥檛 understand the regulatory network of the human body. Are we performing and integrating omics data today? I think it鈥檚 happening slowly and part of that is because it鈥檚 much easier to sequence than to interpret.
SK: There is definitely value in panomics, where we鈥檙e taking whole-genome data and bringing it together with deep phenome, epigenetic, gene expression, metabolomic, and proteomic data. Sequencing the genome is not the end of the game, but it鈥檚 a great start. We鈥檙e starting to understand what we need to deliver precision medicine. For example, we don鈥檛 know what most of the variants that we see in genomes mean functionally. Therefore, we can鈥檛 give a confident assessment of whether they could produce a change in a human being. It鈥檚 clear that we need additional types of data to be able to make those assessments at scale.
JM: The future of clinical research and medicine will revolve around the integration of Big Data sets. It鈥檚 more than just individual and amalgamated genomic data sets. Increasingly, these will become merged with transcriptomic, epigenomic, proteomic, and most importantly, phenotypic data to create highly connected, information-rich data sets. Medicine is heading quickly towards Big Data and the acquisition of tens and hundreds of thousands of genome sequences will accelerate this. It鈥檚 going to change everything.
鈥淭here is definitely value in panomics, where we鈥檙e taking whole-genome data and bringing it together with deep phenome, epigenetic, gene expression, metabolomic, and proteomic data.鈥
Q: How important will bioinformatics and databases be in gaining the full value of population sequencing?
CB: From the beginning, it was clear that we would have to marry sequencing with analysis tools to make sense of all the data. By linking and analyzing phenotypic and genotypic information, we can begin to unravel patterns that we can鈥檛 see from static data. There鈥檚 an optimism that if we measure phenotypes and exposures in much more rigorous ways, we could collect vast amounts of data to help us nail genetic associations.
JM: I think the bioinformatics framework and databases are central to the whole endeavor. It will integrate genomic data with orthogonal data sets to extract valuable information. The genetic patterns we identify will help inform individual circumstances in the clinic, and through the analysis of the metadata, entire health systems in terms of patterns of disease, co-morbidities, etc.
Population sequencing isn鈥檛 for the faint hearted. We鈥檝e invested about $10 million over the last 1鈥2 years into building the computational pipelines. We have a growing team of 60 people working on the entire assembly pipeline, performing sequencing, assembling data, calling variants and variant difference between populations, and connecting the data with phenotypic data.
In the new world of genomics, every student, postdoc, laboratory, and department will need to have the ability to handle and analyze Big Data. It鈥檚 not something for specialists at the end of the corridor. It鈥檚 central to the entire endeavor of research and medicine. It鈥檚 a data driven world and we鈥檙e charging into it.
SK: We recognized the value of bioinformatics in a recent study that compared the effectiveness of WGS and traditional genetic testing to identify Mendelian disorders in critically ill newborns.2 To analyze the data, we developed several novel bioinformatics tools. The paper demonstrates the usefulness of genome sequencing, but we need further evidence of the clinical value of genomics. We鈥檒l also need a streamlined method for informing clinicians of the results, not just for diagnosis, but also for how NGS data can inform treatment decisions.
Q: What kinds of databases will be required?
JM: We need national-level genotype/phenotype correlation databases that are maintained by health authorities and can be queried by accredited researchers and clinicians. They鈥檒l have to be national databases because there are legal and other contextual requirements that are idiosyncratic to each jurisdiction. Somehow they need to be linked into one global database so that data generated in one country can be used elsewhere and explored in multidimensional ways to advance our understanding of human biology and disease.
鈥淚 think population-scale sequencing in the broadest sense will begin with children, possibly at birth to replace the present Guthrie test.鈥
Q: How long will it take to create these databases?
JM: We can鈥檛 sequence everyone in the world overnight, but I鈥檓 convinced that within a decade we鈥檒l have large genomic databases. Genomic data will increasingly become a standard part of medical records. Ideally, we鈥檒l have well curated, evidence-based genotype/phenotype correlation databases in the cloud that are maintained and continuously updated national resources.
The initial use will be sequencing individuals with serious genetic disability, because we can diagnose the causative mutation in about half of such cases very quickly. Cancer stratification will be an important area, enabling physicians to determine the molecular basis of the disease and consequently treat the disease more effectively. The third area will be to detect the genetic markers of adverse drug reactions because that鈥檚 a huge burden on the hospital systems in every country. We鈥檒l be able to predict and avoid a high proportion of those adverse reactions through genomic information.
We鈥檙e proposing that the Australian health system sequence everyone with developmental and/or intellectual disabilities as a first-line diagnostic. I expect that will become routine over the next 2鈥5 years. I think population-scale sequencing in the broadest sense will begin with children, possibly at birth to replace the present Guthrie test. The next generation of kids will be the genome generation, with genome sequencing and analysis applied selectively and then more widely as the technology and the value of the information improves.
Q: Do you think WGS will become a routine clinical test?
JM: We鈥檙e close to sequencing being used routinely as part of a medical examination. The cost of sequencing will continue to decrease, making it feasible to perform reanalysis to improve the accuracy of someone鈥檚 primary genome data, to incorporate epigenomic and transcriptomic data, or to look at somatic variations. The value of sequencing will go up as we get more information about what variation in the genome means in biology and medicine. Higher use of sequencing in medicine is now limited by the richness and quality of the databases that sit behind the analysis of that information.
It鈥檚 worth noting that the American College of Medical Geneticists (ACMG) has mandated reporting on 56 genes because it can have a significant bearing on a patient鈥檚 future health. We鈥檒l start to see well-validated collections of genes that will be either mandated to report or that organizations working in this space will be confident to report back to clinicians and patients, with the list expanding over time.
SK: We have a rich tradition of newborn screening programs where each baby at birth has a heel stick that鈥檚 tested for 29 conditions. Several groups around the US are starting to investigate what additional information would be provided if we could replace the heel stick with genome sequencing. We don鈥檛 know yet.
鈥淧opulation sequencing will enable us to uncover and characterize global allele frequencies of clinically actionable variants involved in adverse reactions.鈥
Q: Is human whole genome data already moving us closer to personalized medicine?
CB: I think genome sequencing is going to end up being a part of routine care and a component of people鈥檚 electronic health records. It鈥檚 an interesting time because we鈥檙e in a bit of a transition phase. Sequencing technology has matured and people are implementing high-throughput sequencing and soon will be performing population sequencing routinely.
We need to come up with a concerted plan for aggregating these data, analyzing them, and translating them into health benefits as quickly as we can. Ultimately, we need to provide the public a good return on the investment.
SK: In the future, sequencing results will inform treatment changes. Traditionally, the diagnostic sphere has been the home of the pathologist and the laboratorian, while medical implementation has been the role of the physician and clinician. In genomic medicine, those two will be fused. That鈥檚 going to be a challenge because neither side is used to having the other side involved in those tasks or information.
JM: I think the problem is that our understanding of the genome is still limited. Today, we can only accurately report on the impact of some variations in protein-coding sequences. It鈥檚 a huge effort to assemble enough evidence and data from the literature to confidently call mutations or variations in other parts of the genome that might have medical significance. Large global databases created through population sequencing will support this effort. These databases will contain sequences that reflect a spectrum of mutations and phenotypic characteristics, will enable queries to determine if a new sample reflects the symptoms and mutations of those already in the databases.
Q: How will the data from population sequencing transform medicine?
JM: Population sequencing will have a profound impact on medicine, changing it from the art of crisis management to the science of good health. We now understand that individual genomic variation and our genetic idiosyncrasies affect our present health and contribute to the risk of future disease, whether it鈥檚 type 2 diabetes, cancer, rheumatoid arthritis, or Alzheimer鈥檚 disease. In many cases, forewarned is forearmed, enabling clinicians and patients to implement strategies to reduce, avoid, or prepare for these eventualities.
SK: I study rare genetic diseases in children, which are simple genetically. We now have the ability to make rapid diagnoses and so, for the first time, those conditions can be a cost-effective place to develop and manufacture a drug. Our hope is that genomes will increasingly be as valuable with diagnosing complex disease as they are with single-gene disorders. It鈥檚 going to take a couple of decades to catch up, and population studies will be very important in closing the gap. One of the things that is exciting about population studies is that we鈥檙e starting to redefine how we describe diseases based on genetics, rather than based on symptoms.
JM: Population studies will inform the development of therapeutics, especially in identifying the genetics of adverse reactions. There are 100,000 deaths a year in the United States from adverse drug reactions to prescription drugs.3 In Australia, at least 2鈥3% of all hospital admissions are due to adverse reactions to prescribed drugs.4
CB: For example, Abacavir is an important HIV drug and researchers have identified an HLA variant involved in Abacavir hypersensitivity. Prevalence of the variant is low in Africans and Europeans, but there is a 20% frequency of the mutation in certain populations in India and Asia.5 If a patient with the variant is given Abacavir once, they become very sick. If they are given it twice, they die. Population sequencing will enable us to uncover and characterize global allele frequencies of clinically actionable variants involved in adverse reactions. The bottleneck is going to be making drug metabolism information understandable by the physicians so they鈥檒l know to pick drug A vs. drug B, or to give half the dose or double the dose of a drug.
JM: Drug companies are also beginning to use population sequencing to identify exceptional responders in past drug trials. If they can stratify the population and identify the particular genetic background of responders, they can analyze the biochemical pathways involved. They鈥檙e not only rescuing failed drugs, they鈥檙e rescuing responder patients for effective, potentially life-saving treatment.
鈥淧articularly in the US, we need population sequencing of ethnic populations that have the worst health outcomes so the negative gap in their care doesn鈥檛 increase.鈥
Q: How important will it be to sequence ethnic subpopulations?
CB: The fact that we have the technology to perform population sequencing is awesome. However, we need a concerted effort so that research continues on ethnic subpopulations. Without one, the focus will remain on sequencing large homogenous populations, like the Finns or Icelanders. Although those efforts are important, their benefits don鈥檛 translate into all populations. Particularly in the US, we need population sequencing of ethnic populations that have the worst health outcomes so the negative gap in their care doesn鈥檛 increase. This presents a challenge because there鈥檚 no high-level initiative to fund these efforts. The US government鈥檚 Precision Medicine Initiative is a great effort, however it doesn鈥檛 compare to what the UK and other countries are doing. Particularly China, which sees genomics as one of the major planks of their development program.
Q: What has or will be the impact of the $1,000 genome?
SK: The good news is that the $1000 genome exists for population sequencing. What we need in clinical care is for the cost of rapid genome sequencing to decrease to the $1000 genome level, and that hasn鈥檛 happened yet.
JM: The $1000 genome was a practical and psychological tipping point. It鈥檚 changed the way we think about technology and what we believe is possible. It鈥檚 sparked the integration of clinical and research endeavors in a way that we never anticipated or thought would be possible. People now recognize that we鈥檙e close to shifting from genomics being used as a research tool, to it becoming an everyday clinical analysis tool.
鈥淯ltimately, there will be automatic reporting of genomic information into the cloud to and from smart devices. It鈥檚 going to take us places we haven鈥檛 even dreamt of.鈥
Q: When you first became a scientist, did you believe there would be a day when human whole-genome sequencing could be performed in a day?
CB: I would have said that it was impossible. Crazy talk!
SK: Absolutely not. Even if you took me back to when I was sequencing with my first Solexa System, I couldn鈥檛 have anticipated that we would be churning out genomes as quickly as we are.
JM: Not in my wildest dreams. In the second half of the 20th century, we were just cutting our teeth in understanding what DNA looked like, what a gene looked like, and developing primitive genomic analysis tools. At the time, everything we were doing was considered to be leading edge, and it was. Now we are moving at warp speed. The 21st century will be the century of biology and medicine. The integration of NGS with Big Data is still unfolding and will be for the foreseeable future. Ultimately, there will be automatic reporting of genomic information into the cloud to and from smart devices. It鈥檚 going to take us places we haven鈥檛 even dreamt of. It鈥檚 a wonderful and exciting time. We are grateful that companies like VR真人彩票 have led the way technologically.
References
- Miller NA, Farrow EG, Gibson M, et al. . Genome Medicine. 2015; 7(1) 100. do: 10.1186/s13073-015-0221-8.
- Willig LK, Petrikin JE, Saunders CJ, et al. . Lancet Respir Med. 2015; 3(5):377鈥387.
- Preventable Adverse Drug Reactions: A Focus on Drug Interactions. U.S. Food And Drug Administration. . Accessed May 16, 2016.
- Roughead L, Semple S, Rosenfield E. Literature Review: Medication Safety in Australia. . Published August 2013. Accessed May 16, 2016.
- Puthanakit T, Bunupuradah T, Kosalaraksa P, et al. . Pediatri Infect Dis J. 2013; 32(3): 252鈥253.