Using NGS for infectious disease diagnosis and public health surveillance

Written by Tecan

Michael Weinstein (left) is a senior bioinformatics scientist at Tecan (Männedorf, Switzerland) based in Denver (CO, USA). In this interview, Michael discusses how next-generation sequencing (NGS) can be used for infectious disease diagnosis, the role it played in the COVID-19 pandemic, the main challenges faced when diagnosing infectious diseases with NGS and how he thinks the field will develop.

How can NGS be used for infectious disease diagnosis?

All pathogens other than prions carry their genetic information in the form of nucleic acids: DNA or RNA. Genetic material can be extracted from a sample and, often after some preparation steps, sequenced using NGS technologies. Once the sequences are read, they can be filtered, analyzed and compared to reference genomic sequences from the host, common commensal/environmental species and known species of interest for the sample.

From the thermal vents at the bottom of the ocean to the dirt under our feet to our bodies and all the way to the International Space Station, there are microbes that are expected to be present in certain relative abundances under normal conditions. Major aberrations in this balance of microbes can signify an unhealthy environmental condition or an active infectious disease process. Unhealthy microbial states can be described as either microbes out of place or microbes out of balance. Some microbes, such as Neisseria gonorrhoeae, are identifiers of disease nearly anywhere they are observed. Others, such as common gut microbes, are expected and normal in some locations, like fecal and wastewater samples, but indicate an unhealthy state when detected in others, for example cerebrospinal fluid and food supply.

NGS can identify sequences specific to these out-of-place microbes and can give the cause of an infection or contamination and guide remediation. The speed of NGS pipelines can often return a definitive result faster than the traditional culture-based methods and can detect microbes that do not currently have established culture conditions. Identification of microbes out of balance is enabled by the quantitative nature of NGS and its ability to provide relative, if not absolute, quantification of a microbe, for example detecting commensal Pseudomonas species overgrowing and causing an opportunistic infection. Additionally, the use of microbiome and metagenome analyses in contexts not traditionally thought of as infection, such as metabolic, neoplastic and neurodevelopmental disorders, is an area of active research as microbes may be acting in an indirectly protective or harmful fashion, often in conjunction with genetic and environmental factors.

How was NGS used in the COVID-19 pandemic?

The use of NGS for identification of environmental, human, animal and plant pathogens is a rapidly growing field. Many laboratories have been developing or using NGS-based techniques to identify microbial pathogens, with some laboratories having validated pipelines to use these techniques in the diagnosis and treatment of diseases. Nowhere has this application been more visible than the tracking of SARS-CoV-2 during the recent pandemic. The initial identification of a novel coronaviral respiratory pathogen and the rapid release of its full genomic sequence enabled pharmaceutical companies to develop an RNA vaccine at a speed that would have been unthinkable a decade ago. Tecan itself had a hand in the early identification and confirmation of this virus with our single primer isothermal amplification (SPIA) technology that currently powers our Revelo® High-Sensitivity RNA-Seq library preparation kit. Once common sequencing and data aggregation protocols were set, hardly a season went by where the world was not made aware of old SARS-CoV-2 strains fading away, new strains emerging and which emerging strains were of highest concern. Additionally, there is an ongoing effort to identify fragments of SARS-CoV-2 RNA in wastewater and attribute them to emerging strains in order to gain insight into new and potentially dangerous outbreaks before their effects are seen in clinics.

Another visible application is the sequencing of enteric pathogens in suspected foodborne illness cases. In this application, the pathogen species is already identified through culture, but its exact genetic fingerprint can be determined to compare it to similar cases in other regions and potentially even environmental samples taken from food or food-processing facilities. Often this only requires simple library preparation techniques that can even be fully automated on a small device such as Tecan’s MagicPrepTM NGS to free laboratory technicians for other tasks. This allows epidemiologists to identify whether distant cases of the same foodborne pathogen species are likely from the same source or distinct outbreaks from different sources and even what the potential source(s) might be. This allows for recalls of potentially contaminated food to be executed more quickly and precisely and the source of contamination to be determined with higher certainty and provides public health-oriented labs a great way to get started with NGS applications.

Which NGS methods are most suitable for infectious disease diagnosis?

The most common NGS methods likely to be employed in microbe identification are targeted amplicon sequencing, such as 16S rRNA, and total genomic sequencing. Targeted amplicon sequencing is more limited in power but requires less computing and sequencing resources compared to full genome analysis. By far, the most common target for this method is bacterial 16S rRNA analysis. Its targeted nature means that a higher signal-to-noise ratio can be achieved, with more reads providing information on the microbial population of a sample and fewer reads originating from the host or being uninformative. Because this method only analyzes one specific region of the bacterial genome – one that is responsible for basic life functions – more detailed information on the microbe, such as virulence factors and antimicrobial resistance genes, remain hidden.

Between single-amplicon-targeted sequencing and total genome analysis lie amplicon panels. An example of this would include the many amplicon panels created to selectively amplify and sequence the SARS-CoV-2 genome from both patient and environmental samples. These panels were used extensively during the COVID-19 pandemic and were involved in the majority of sequencing efforts to track viral strains.

Full genomic sequencing is the most powerful technique and provides the most complete information on both the overall population of microbes in a sample as well as specific properties of the different microbes in the sample. Full genomic sequencing can be divided into long- and short-read sequencing applications. Long-read sequencing, which is generally carried out using Pacific Biosciences (CA, USA) or Oxford Nanopore Technologies (Oxford, UK) equipment, can facilitate the assembly of full genomes from complex microbial samples, such as feces. Short-read sequencing, which is generally carried out using Illumina (CA, USA) or Ion Torrent (Thermo Fisher Scientific, MA, USA) equipment, is often better suited to analyzing the composition of the sample through the attribution of individual reads to their likely source. The additional information given by full genomic sequencing can be used to both quantify different organisms across multiple domains and provide information on the capabilities of these organisms, including their ability to evade antimicrobial therapy.

What are the main challenges when diagnosing infectious disease with NGS?

While there are ‘easy’ cases for this method, such as performing a targeted bacterial analysis on material obtained from an obvious site of bacterial infection in a patient, many of these applications have challenges at multiple steps. Automation of bioinformatic pipelines through software engineering and automation of wet-lab work through robotic liquid handling ranging from low to high throughput, such as Tecan’s MagicPrep NGS or DreamPrep® NGS, take advantage of our cutting-edge reagents and technologies can help to simplify and scale this process.

Starting with sample collection, there is the challenge of bioburden. Few items in the world are truly sterile, and few of those are without some trace of microbial DNA, even if the source microbe is no longer viable. This is a relatively minor problem in samples where the mass of microbes of interest is expected to be high, such as a stool sample, as the microbial ‘signal’ from these samples is likely to overwhelm any microbial ‘noise’ from a contaminated swab, tube or reagent. Bioburden becomes a more significant issue when quantifying and/or identifying microbes from a lower-input sample such as a skin swab.

Following sample collection, preservation becomes a concern with the knowledge that the samples are alive. These living samples, now moved to a new environment with new surfaces, chemicals, atmosphere and temperature, are likely to be under selection for different microbes than they were in their original environment. This results in some microbes dying, others failing to multiply and some that were previously in the minority suddenly finding more favorable conditions for their growth and ‘blooming’ during sample transport and storage; often the blooming microbes do so by consuming those that die.

All cells must have some form of membrane containment and many microbial cells have some form of cell wall. In order to analyze microbial genetic material, one must first extract it from its cell or capsid while keeping it sufficiently intact for sequencing. The different relative toughness of different microbes to physical, chemical and thermal lysis can skew results towards over-representation of easy-to-lyse microbes and under-representation of those that are not completely lysed. This is particularly important for long-read sequencing, as the genetic material needs to be extracted using a harsh enough protocol to lyse tough microbial cell walls, but gentle enough to minimize the shearing of long nucleic acid fragments. Failure to balance these two opposing concerns can result in either heavy bias towards easy-to-lyse microbes or the disappointing use of long-read sequencing on short nucleic acid fragments.

Analysis also presents challenges. For example, targeted amplicon sequencing attempts to identify and make conclusions about a microbe from only a small segment of its genome. This means two closely related microbes with important differences could be confused and information on antimicrobial resistance is at best an inference from the data.

For full genome sequencing, the first challenge during analysis is dealing with the large amount of host genome likely to be present in the sample. This often necessitates a large number of NGS reads to be performed to detect the rare microbial sequence in the sample, which increases the amount of data generated, computational requirements and associated costs. Even after removing all the host reads, there are bacterial reads that are likely to map to mobile or shared regions of bacterial genomes. Many of the virulence factors and antimicrobial resistance genes that cause pathogens to become more dangerous arrived in their current bacterial host through a virus or some other form of gene transfer and may be present in other species as well, leading to ambiguous or erroneous identifications.

In the case of wastewater analysis for epidemiological purposes, the sample matrix itself provides a challenge. The more concentrated ‘sludge-like’ wastewater contains a high concentration of target microbes as well as a high concentration of chemical contaminants, which can degrade the sample or inhibit reactions. The less concentrated more ‘watery’ wastewater samples, on the other hand, simply have very little genetic material per unit of volume and must undergo extreme efforts to concentrate the sample sufficiently to detect anything.

Tecan possesses in-house expertise to provide solutions across many disciplines of life sciences, ranging from precision oncology to microbial analysis and beyond. Our NGS automated library prep solutions encompass reagents, automation, consumables, application support and even bioinformatic analysis. Our commitment to customer success establishes us as a reliable partner for researchers aiming to achieve more with fewer resources.

How do you think this field will develop?

Many of the techniques necessary for this field to have clinical and public health applications have already been developed and are already deployed in these areas. Presently, some of the biggest challenges to its deployment clinically are more administrative in nature. One such challenge is the difficulty of validating and troubleshooting these pipelines, which require understanding of both molecular biology and bioinformatics. An additional administrative challenge to deployment of these pipelines is fitting them into the current insurance reimbursement frameworks that are often not designed to handle tests capable of identifying conditions that were not initially being considered, for example the identification of an asymptomatic sexually transmitted infection while analyzing a reported urinary tract infection. Additionally, there are questions about how these incidental findings of reportable diseases are to be handled with regard to their reporting to public health agencies and the potential for increased spurious detections of reportable diseases.

About the author

Michael Weinstein is a senior bioinformatics scientist at Tecan based in Denver. His focuses include developing NGS library preparation methods and supporting the success of Tecan’s customers in NGS applications through his knowledge of both wet- and dry-lab techniques. His areas of interest include microbiome analysis, cancer and rare disease genomics, and public health/infectious disease research. In addition to his role at Tecan, he is also an adjunct faculty member at the University of California, Los Angeles (CA, USA) where he teaches bioinformatics workshops that include the introduction to NGS analysis, advanced Python programming, microbiome analysis and bioinformatic software design practices.