GenSLM

The potential for extant and emerging pathogens to become global health crises necessitates the development of novel methods for proactively engaging these threats before they become pandemic. Recent advances in machine learning and artificial intelligence—specifically, large language models (LLMs)—provide powerful tools for predictive modeling and monitoring of pathogens of concern. The team’s prior work developing Genome-scale Language Models (GenSLMs) demonstrated the potential for LLMs to predict future SARS-CoV-2 variants of concern prior to their emergence by modeling the evolutionary process. In this project the team builds on that work by scaling GenSLMs beyond the (relatively) simple SARS-CoV-2 to multi-segmented viruses and comparatively enormous bacterial genomes, and even further to more complex eukaryotic organisms including yeast and humans.