Funding:
Project Acronym: RESONANCE
Project Name: RESONANCE: Promoting large-scale use of genomics in understanding bacterial pathogen dynamics and evolution
Activity years: 2025 - 2027
Funding: Investigação científica e desenvolvimento tecnológico (IC&DT)
Reference: MPr-2023-12-SACCCT
Host Institution: Consortium Universidade de Lisboa / Instituto Politécnico de Lisboa
Beneficiary Entity: Instituto Superior de Engenharia de Lisboa
Budget: 249.480,00 €
Researchers (ISEL):
Cátia Vaz ORCID: 0000-0001-6074-3074
Description:
Genomic based gene-by-gene methods, such as whole- or core-genome multilocus sequence typing (wg/cgMLST), were adopted by leading international agencies, such as the European Food Safety Agency (EFSA), the European Centre for Disease Control (ECDC) and the United States Centers for Disease Control (CDC) for the surveillance of several important bacterial pathogens[1]. However, genomic surveillance systems are useful beyond transmission and outbreak identification by offering insights into the ecology and evolution of a given pathogen. Genomic information can also predict important phenotypic characteristics such as virulence, serotype, and antimicrobial resistance and make it possible to track variation at the population level of pathogen targets used in vaccines and which may vary in response to the selective pressure of vaccine induced host immune response[2]. These types of uses of wg/cgMLST in real-time require the availability of richly annotated schemas, including loci encoding relevant pathogen antigens, virulence factors and resistance determinants, frequently present in the accessory genome. Despite the success of cgMLST typing[3], existing schemas are frequently based on a limited number of genes and may therefore not be suitable for the investigation of short-term outbreaks or for the broader population surveillance goals outlined above. Moreover, currently used schemas were frequently defined based on either a single genome (https://www.cgmlst.org/ncs/) or a few of the limited number of genomes available at the time of schema creation, potentially not representing the diversity of the microbial population of the species they are meant to survey. A transition to broader wg/cgMLST schemas could thus offer greater discrimination and a more complete view of the genetic potential available to a species. While the adoption of new, more representative, schemas could be desirable, the human effort necessary to do so is a barrier to their creation. On the other hand, tools that allow the mapping of old schemas onto new ones by the identification of common loci (and the rescue of any locus associated information) or by assessing their clustering congruence, would allow the reuse of existing information and a smoother transition of ongoing surveillance efforts onto newer schemas. These aspects are important when considering the need to track emerging bacterial variants, as optimization of subtyping is key to outbreak investigations, as was illustrated for the German 0104:H4 outbreak in 2011. Understanding the short-term evolution of these emerging strains also benefits from expanded schemas encompassing the potential novel genetic elements associated with these new variants. While sequence variation has been used to establish strain relationships, chromosome topology information has been largely ignored [4]. Notwithstanding, synteny was shown to be useful in identifying horizontal gene transfer (HGT) events, to identify genetic modules for further study and to provide information regarding the similarity of distantly and closely related organisms. From an evolutionary standpoint, both large-scale as well as short range changes in synteny have been shown to be important for overall strain fitness and virulence, further supporting the need to incorporate synteny into pathogen surveillance and evolutionary studies in a more systematic way. The goal of this project is to create software tools facilitating the creation of a framework for the genomic analysis of a user’s pathogen of interest. These will allow tracking strains with higher resolution than currently possible, by making use of previously unexplored features of the genomic data, while allowing the exploration in real-time of aspects relevant to pathogen evolution. The tools will be able to manage the large amounts of data currently available and anticipated to be generated in the future. Having such a complete and accessible toolbox will lower the barrier of entry to the application of genomics to the surveillance of a wider array of bacterial pathogens. By leveraging typing information to obtain insights into the population biology and evolution of bacterial pathogens the software created will contribute to monitoring pathogen adaptation to the human host. The knowledge generated is expected to inform pathogen containment measures, identify potential vaccine targets or changes in vaccine targets that could compromise their efficacy, monitor the emergence of novel strains of particular concern and contribute to the design of rapid diagnostic tests to identify them. To achieve these goals we have divided the work into five scientific tasks. In task 1 we will improve the efficiency of our current wg/cgMLST software chewBBACA, focusing on the computational architecture, but also the conceptual approach to allele identification. We will increase the accuracy of the existing implementation while exploiting more of the genomic variation. In task 2 the focus will be lowering the barrier of creating and expanding existing schemas and on facilitating their annotation to allow using the information collected in real-time to help monitoring properties relevant for host-pathogen interaction and for pathogen control. In task 3, graphical user interfaces (GUIs) will be created, allowing domain experts to explore the data using visual analytics principles. The goal is to promote the integration of the data derived from these analyses with the biological and epidemiological knowledge available, contributing to a more informed understanding of pathogen spread, dynamics and evolution. In task 4 we are aiming to make use of the genomic information beyond sequence variation. Although there have been studies on synteny variation in bacteria and how that can be used to discriminate strains, as well as on the phenotypic implications of synteny variations, no systematic studies have been performed. By integrating genomic structural variation with surveillance activities we aim to expand the current knowledge of the diversity of genetic arrangements within pathogenic bacterial species, leverage it to refine strain identification, help to identify HGT events and assist in pinpointing potential regions of interest for further studies. Task 5 will focus on creating a tool for large scale phylogenetic analysis, addressing the current challenges of dataset size and algorithmic complexity. Following a modular structure, the tool will have flexibility to be integrated into pipelines and for the reuse of its modules by other tools. The team assembled to advance the project aims is ideally suited to tackle the challenges posed by the tasks. iMM’s experience in pathogen surveillance, genomics and evolution is complemented by the strong knowledge in algorithms in the context of phylogenetic and population structure evaluation and expertise in data models of INESC-ID and ISEL, that also bring to bear a strong know-how in software engineering.