Department of Pathology

Dr Andrew Firth

Research description

RNA viruses have compact multifunctional genomes. During the course of infection, the genome or its derivatives must direct translation of virus proteins, genome replication and genome packaging. To realize these multiple roles, RNA virus genomes commonly have many overlapping coding and non-coding functional elements. Overlapping functional elements often escape detection because it can be difficult to disentangle the multiple roles of the constituent nucleotides via, for example, mutational analysis. Systematic synonymous-site mutational analyses are resource-intensive and can miss functional elements that are only required in vivo. Meanwhile, high-throughput techniques such as SHAPE and ribosome profiling often have difficulty distinguishing functional elements from incidental features. Comparative computational analyses offer a way forward. By studying patterns of nucleotide substitutions across sequence alignments, it is often possible to predict novel functional elements and gain insight into their function. We are using comparative genomic approaches to identify new features in both plant and animal RNA viruses, and guide follow-up experimental analyses.

Virus comparative genomics database.

Fig 1. Above - map of the poliovirus genome showing the polyprotein-encoding sequence (blue), UTRs (black), and known functional RNA elements (Goodfellow et al 2000, PMID 10775595; Liu et al 2009, PMID 19781674; Song et al 2012, PMID 22886087; Burrill et al 2013, PMID 23966409). Below - conservation at synonymous sites in an alignment of 50 enterovirus sequences: synonymous substitution rate relative to the genome average (brown) and statistical significance (red). All known functional RNA elements embedded within the polyprotein-encoding sequence are easily detected, besides a few additional elements in the region encoding 3D. (click for full size figure).

One particularly powerful approach is to analyze the rate of nucleotide substitutions at synonymous sites in alignments of related virus coding sequences (e.g. Choi et al 2001, PMID 11338395; Simmonds et al 2008, PMID 18319285). A statistically significant reduction in variability at synonymous sites is indicative of an overlapping functional element such as an overlapping gene or a functional RNA structure. Fig. 1 shows an example of our own synonymous site conservation algorithm applied to an alignment of enterovirus sequences. We are applying this and other comparative genomic techniques to all sequenced RNA viruses (besides some cellular organisms) to identify novel features, both coding and non-coding. By mapping all the functional elements in the genomes of economically and medically important RNA viruses, we hope to provide a better platform on which to build future research.

Fig 2. Influenza A virus genome map and synonymous site conservation analysis. Conservation peaks correspond to terminal packaging signals, splicing enhancer sequences, and the PA/X and NS1/NEP dual-coding regions (click for full size figure).

Many 'hidden' genes are translated via non-canonical mechanisms such as programmed ribosomal frameshifting, non-AUG initiation, and internal ribosome entry sites (IRESes). Since viruses use the host cell's ribosomes and many other components of the protein synthesis machinery, these unusual translation mechanisms are potentially also relevant to cellular gene expression and genome annotation. Indeed programmed -1 ribosomal frameshifting, stop codon readthrough and internal ribosome entry - now known to be important for the expression of certain cellular genes - were all first identified and studied in viruses. We are interested in finding new non-canonical translation mechanisms (e.g. -2 frameshifting; Fang et al 2012, PMID 23043113) in viruses, characterizing mechanisms and associated sequence motifs, and then searching for related instances in cellular genes.

Funding for our research comes from the Wellcome Trust and the BBSRC.