y1000+ Project

Main Content

For over 400 million years, yeasts of the ancient fungal subphylum Saccharomycotina have evolved to inhabit every continent and every major aquatic and terrestrial habitat, but little is known about their diversity and ecology. A key factor of yeast ecological dexterity is their impressive diversity of resource management strategies, especially for carbon and energy metabolism.

Among the yeasts, the bread, beer, and wine yeast Saccharomyces cerevisiae is the best known and a chief model of genetic research. In contrast to this inveterate fermenter, the metabolisms of the other >1,000 known species in this subphylum vary widely, and they harbor as much genetic diversity as the entire animal kingdom. Some can produce oils, such Yarrowia lipolytica and Lipomyces starkeyi; some can ferment xylose, a common plant cell wall sugar, such as Spathaspora passalidarum and Scheffersomyces (Pichia) stipitis; several are opportunistic pathogens, such as Candida albicans and Candida auris, which are recognized in the critical priority group of fungal pathogens by the World Health Organization; and most actually prefer cellular respiration, instead of fermentation.

With core funding from the National Science Foundation, the Y1000+ Project aims to sequence and analyze the genomes of every known yeast species, reorganize their taxonomy in a phylogenomic context, reconstruct their genotype-phenotype map with rich metabolic data, and determine how this map has evolved over deep time.

Building on a qualitative phenotypic dataset and a genus-level dataset of 332 genome sequences, the Y1000+ Project has now published genome sequences of nearly every known species, conducted high-throughput quantitative phenotyping in 24 growth conditions, and created a hierarchical ontology of isolation environments. The current dataset includes 1,154 yeast strains that are now grouped into 12 recently described taxonomic orders.

Some traits have evolved a handful of times, while many have evolved independently dozens of times, providing ideal systems for the study of historical contingency, biological innovation, and convergent evolution. Coupled with this unprecedented dataset, these repeated evolutionary events provide power to statistical, phylogenetic, and machine learning approaches to reconstruct evolutionary events and identify their genetic foundations. Key findings of the recent core manuscript include a lack of evidence for trade-offs between carbon niche breadths and growth rates, a major role for intrinsic genetic factors in determining whether a yeast is a carbon generalist or specialist, generalists losing fewer traits (and even gaining more), and specific genetic pathways that predict whether a yeast will be a generalist or specialist. In addition to the core publications, team members have made several remarkable and unexpected discoveries that are illustrating new principles of biology. For years to come, we anticipate ongoing research paths by many research groups will use this rich dataset and analytical framework to connect DNA to diversity.