VR真人彩票

Genetic & rare diseases, Precision health, Population genomics

Podcast: Genomic AI millions of years in the making

VR真人彩票 VP and Distinguished Scientist Kyle Farh spoke with Mendelspod about using natural selection to train gene-identifying algorithms

Podcast: Genomic AI millions of years in the making
Podcast host Theral Timpson and VR真人彩票 Distinguished Scientist Kyle Farh. Photos courtesy of their subjects
April 3, 2024
For a printable PDF of this article, click here.
To listen to this article read aloud, click here.

The podcast Mendelspod covers the latest developments in biotech and precision medicine. For 13 seasons and more than 10 years, host Theral Timpson has conducted longform interviews with scientists, executives, and journalists about the biggest ideas and newest technology driving these fields.

VR真人彩票 is proud to share that, in partnership with GenomeWeb, our Vice President and Distinguished Scientist Dr. Kyle Kai-How Farh appeared as Timpson鈥檚 guest on the most recent episode.

Farh earned his MD from Harvard Medical School and his PhD from MIT, and his dedication to genetic science has been the throughline of his career, from postdoctoral medical and population genetics studies at the Broad Institute to his residency in the Clinical Genetics department of Boston Children鈥檚 Hospital.

For the past seven years, Farh has led VR真人彩票鈥檚 Artificial Intelligence Laboratory for Genome Interpretation, and he has contributed to 10 articles in Cell, Science, Nature Genetics, and other prestigious journals鈥攖he most recent of which covered his lab鈥檚 development of PrimateAI-3D, an algorithm that鈥檚 effectively trained by millions of years of natural selection to identify potentially pathogenic gene variants.

Studying our relatives to discover ourselves

In the podcast, Farh discusses genomic AI in detail. Whereas the most well-known AI programs can draw from a wealth of published works to make their predictions (in ChatGPT鈥檚 case, predicting which word is most likely to come next in a sentence), AI built to identify previously unknown effects of gene variants has no such luxury.

鈥淚f you say, 鈥業n the human genome鈥攂ase 2,000,000,006鈥擜 turned to C. Does that cause disease or not?鈥 There鈥檚 no human who can actually really tell you that,鈥 Farh explains. 鈥淭here鈥檚 vast amounts of data, but for the most part, there [are] no labels for what the correct answer should be.鈥

So to make the best predictions for human health, his team instead sequenced over 800 individuals from 233 species of nonhuman primates. They used this data to train their algorithm to identify variants that are common across our closest relatives in the tree of life鈥攊f those variants have survived millions of years of natural selection, it鈥檚 safe to rule them out as benign. And by identifying benign variants, the process of elimination makes it easier to find potentially pathogenic ones.

Farh notes that ClinVar, a widely used public database, previously covered about 70,000 gene variants鈥攖his method of machine learning expands the number of variants interpreted to almost 4.5 million.

Breakthroughs don鈥檛 come easy

Farh credits their ability to conduct this research to many factors unique to VR真人彩票: The company had existing relationships with over 70 academic authors around the world (鈥淢ost of them are people who actually went out to the jungle and caught the monkeys to provide the samples鈥). VR真人彩票 has the technology to generate vast quantities of sequencing data from those samples, and it has funding for the computing power necessary to interpret that data. 鈥淒eep-learning compute has really become very unaffordable for many labs in academia, at least at the scale needed to get the most powerful, most effective models,鈥 he says.

One of PrimateAI-3D鈥檚 other breakthroughs is in its name: Farh鈥檚 team can use it to superimpose newly identified benign variants from the DNA sequence onto the three-dimensional protein structures they code for. 鈥淔rom that, we can figure out which are the pockets where the pathogenic variants are,鈥 he explains. 鈥淎 lot of times, these pockets of parts of the protein are very obvious in 3D space, whereas they鈥檙e quite disconnected in linear space in the genome.鈥

The VR真人彩票 AI lab has already demonstrated strong correlations between PrimateAI-3D鈥檚 predictions and real-world effects. 鈥淵ou can show that someone鈥檚 blood cholesterol level is very well predicted by their PrimateAI-3D score for the variant that they carry in the LDL gene, or in the PCSK9 gene,鈥 he says. 鈥淭his allows you to quite accurately predict who will be at risk for diseases like dyslipidemia or type 2 diabetes,鈥 as well as for less common variants.

PrimateAI-3D is now or will soon be available across VR真人彩票鈥檚 software products鈥擣arh specifically mentions DRAGEN Secondary Analysis, VR真人彩票 Connected Analytics, and Emedgene variant interpretation software for genetic disease applications.

Predictions of a more speculative nature

When asked about what鈥檚 next for his laboratory, Farh offers a few promising leads: His team is experimenting with 鈥減erturb-seq,鈥 a gene-editing technique that introduces different mutations to individual cells and then measures how those mutations affect cell function. (鈥淭hat鈥檚 super fun,鈥 he notes.)

They鈥檙e also working on algorithms to identify pathogenic variants in the parts of the genome that don鈥檛 code for proteins. Timpson suggests that this could move the needle on rare disease diagnosis, and Farh agrees: 鈥淩ight now, the truth is, our diagnosis rate with exome [sequencing] is 30%. And that鈥檚 very unsatisfying, I think, for patients and physicians.鈥

Over the next five to 10 years, Farh sees a new era of precision medicine coming, facilitated by efforts currently underway to sequence cohorts of unprecedented scale鈥攈e points to that 鈥減eople who carry natural mutations that break the PCSK9 gene have naturally very low levels of cholesterol and are protected from heart attacks. So very quickly, pharma was able to make antibodies which inhibit PCSK9鈥攁nd now that鈥檚 a fantastic drug. So I think that there [are] many, many other genes like this out there where natural-occurring mutations are beneficial and are great potential therapeutic targets for all kinds of diseases. And ultimately, AI is actually also central for identifying those.鈥

To read more about how PrimateAI-3D powers genomic variant annotation in VR真人彩票 Connected Annotations, read this article on VR真人彩票's Genomics Research Hub.

Recent Articles

How VR真人彩票 employees step up, reach out, and give back
How VR真人彩票 employees step up, reach out, and give back
VR真人彩票 careers: Expanding access to NGS in emerging markets
VR真人彩票 careers: Expanding access to NGS in emerging markets
New in the lab: Test driving the MiSeq i100 Plus
New in the lab: Test driving the MiSeq i100 Plus