Computational tool ESPRESSO discovers and quantifies RNA isoforms

Written by Aisha Al-Janabi (Assistant Editor)

ESPRESSO overcomes the limitations of error-prone long-read RNA sequencing, providing a useful resource to study transcriptome variation.

Long-read RNA sequencing platforms could reveal variations in the transcriptome of rare genetic diseases; however, long-read sequencing is less accurate than more relied-upon short-read RNA sequencing. To address this limitation, researchers at the Children’s Hospital of Philadelphia (PA, USA) have developed a new computational tool called ESPRESSO to accurately quantify RNA molecules from error-prone long-read RNA sequences. They hope ESPRESSO will be a useful tool for studying RNA in biomedical and clinical settings.

Alternative splicing can cause nascent RNA molecules to be cut and joined in different ways before they are translated into a protein, which means a single gene can encode a variety of different proteins. This occurs in a range of biological processes; however, it can become dysregulated in certain diseases, so studying the transcriptome is important for understanding the underlying cause of some conditions.

RNA molecules are thousands of bases long, making it challenging to read entire RNA molecules and, therefore, study the transcriptome. To do so, researchers rely on short-read RNA sequencing and use computer programs to reconstruct the full RNA sequence. This approach results in a low per-base error rate of approximately 0.1% (one incorrect base for every 1,000 bases). Despite its accuracy, short-read RNA sequencing is limited in its ability to discover RNA isoforms – different RNA molecules from the same gene.

Long-read sequencing can sequence RNA molecules that are over 10,000 bases in length without breaking up the molecule. However, the per-base error rate of these methods is between 5% and 20%, which has limited the widespread use of long-read sequencing. This error-rate also makes it challenging to determine the validity of previously unknown RNA molecules.


CellREADR targets cells, not genes, for RNA-based editing across species

Researchers at Duke University (NC, USA) have developed CellREADR, a new piece of RNA-based editing technology that operates by targeting cells instead of genes.


Now, the researchers of this study have developed ESPRESSO (Error Statistics PRomoted Evaluator of Splice Site Options), which can more accurately discover and quantify different RNA molecules from long-read RNA sequencing data.

ESPRESSO first compares all long-read RNA sequences of a given gene to its corresponding genomic DNA and then uses the error patterns of individual long-read sequences to identify the splice junctions and the corresponding full-length RNA isoforms, including those that have not been documented in existing databases.

Using simulated data and data from biological samples, the researchers evaluated ESPRESSO’s performance and found it could discover and quantify RNA isoforms better than many tools that are currently available. The researchers generated and analyzed more than 1 billion long-read RNA sequences from 30 human tissue types and three human cell lines to create a resource for studying human transcriptome variation at the resolution of full-length RNA isoforms.

“Long-read RNA sequencing is a powerful technology that will allow us to uncover RNA variation in rare genetic diseases and other conditions, like cancer,” commented Yi Xing, the senior author of the study. “We are probably at an inflection point in how we discover and analyze RNA molecules. The transition from short-read to long-read RNA sequencing represents an exciting technological transformation and computational tools that reliably interpret long-read RNA sequencing data are urgently needed.”