Become a member of BioTechniques (it's free!) and receive the latest news in the life sciences and member-exclusives.

The latest in de novo assembly


Jonathan Coker, CTO of OmniTier Inc, provides an overview of de novo assembly, highlighting its use in personalized medicine and looking to the future.

de novo assembly | expert interview.

Jonathan Coker is chief technology officer of OmniTier Inc (MN, USA), a bioinformatics solutions company. He has led advanced research groups at the Mayo Clinic (MN, USA), IBM (NY, USA) and HGST (now Western Digital, CA, USA) and holds 68 granted patents in the US alone. A globally-recognized lecturer at industry gatherings and academic seminars, he holds a PhD degree in electrical engineering and mathematics.

Please could you define de novo assembly for us? 

Bioinformatics experts use this phrase to describe methods by which biological information (such as DNA or RNA) is assembled into a whole unit from measurements of its fragments. The Latin de novo, or ’from new’ indicates that such algorithms use little or no previously-known information about the problem. While de novo assembly is again a hot topic in bioinformatics R&D, the general idea has been used since the beginning of the bioinformatics discipline.

Can you summarize some of the main DNA scientific advances of late, and how de novo assembly fits into them?  

Advances in DNA technology can be grouped into three broad categories:

  1. Primary technology addresses the basic sequencing measurement processes and devices, as driven by Illumina, PacBio, Complete Genomics and other companies.  Current advances include longer reads, and various proximity information designed to extend the utility of shorter reads, and cost reductions.
  2. Secondary analysis involves extracting the maximum information from measurements produced by primary technology, up to and including identification of the genome under test. Assembly technology fits in this category, and is expected to be a major workhorse in all of the primary advances.
  3. Tertiary analysis connects secondary output with certain medically relevant phenomenon in the patient, which powers personalized medicine.

How do these advance personalized medicine?

Although every human is unique, more than 99% of your DNA is identical to that of all other humans. Variation in the small remainder is what makes you, you – including your need for and reaction to medicines, your predisposition to disease, and other medically important phenomena.

As a result there’s tremendous interest in research to accurately identify those variations, and then to match this personalized information with the most effective personalized treatment plan. Assembly-based variant calling addresses that first, identification step.

What kind of labs are using de novo assembly techniques today? 

Currently, assembly technology is used occasionally in research settings to identify DNA patterns in very specialized problems which alignment technology cannot serve well. Examples include studies in immunoglobulins, proteogenomics, and cancer. Assembly technology is not yet widely used in a clinical setting.

What advantages are they seeing?

Researchers who have integrated de novo assembly have been able to resolve novel DNA information that alignment technology cannot provide.

What’s limiting more widespread adoption?

The availability of appropriate and easy-to-use software has constrained the use of de novo assembly. Typically, researchers need to string a long set of software tools together to solve their particular problem. In addition, as problem size increases up to the whole genome, computational complexity, resources, and cost become problematic.

What developments can we expect to see in 2019?

Expect to see more commercially-available, integrated software solutions that employ de novo assembly capability. It’s a tremendously exciting time to be in this sector. We’re learning to assemble a jigsaw puzzle even without prior knowledge of the completed picture – by letting advanced algorithms help us solve for the exact shape of the variable pieces. Advances in several domains including assembly-based variant calling are enabling that to take place faster and easier than ever.