Become a member of BioTechniques (it's free!) and receive the latest news in the life sciences and member-exclusives.

ChromoGen: the AI tool predicting 3D genomic structures in minutes

Written by Annie Coulson (Digital Editor)

A novel, freely available AI tool has cut the time it takes to determine chromatin structure from days to minutes.

The 3D structure of DNA is important for controlling cell-specific gene expression patterns, but determining that structure can be labor-intensive. Now, researchers from the Massachusetts Institute of Technology (MA, USA) led by Bin Zhang have developed an AI model that can accurately predict 3D genomic structure in minutes rather than days. The tool can be used to explore how genomic structure affects gene expression in health and disease.

Inside the cell nucleus, DNA and proteins form a complex called chromatin. Long strands of DNA wind around proteins, condensing 2 meters of DNA into a nucleus just one-hundredth of a millimeter in diameter. Epigenetic modifications influence the folding of chromatin, which impacts gene accessibility and plays a crucial role in regulating gene expression. Scientists have developed experimental techniques, like Hi-C, for determining chromatin structures; however, these techniques are labor-intensive, and it can take a week to generate data from one cell.


New software tool visualizes the inside of 3D images

A new open-source software tool helps see the inside of 3D and 4D images, providing insight into embryonic mouse heart development.


To create a more rapid technique, the researchers turned to AI. They developed a model called ChromoGen, which is made of two components. The first component is a deep-learning model that reads the genome, analyzing both the DNA sequence and chromatin accessibility data. The second component is a generative AI model trained on more than 11 million chromatin conformations, enabling it to predict physically accurate chromatin conformations.

When integrated, the deep-learning model informs the generative model about how the specific cell type’s environment influences the formation of various chromatin structures. This approach effectively captures the relationship between sequence and structure. As DNA is a highly disordered molecule, a single DNA sequence can give rise to many different possible conformations, so the model generates various potential structures for each sequence.

“A major complicating factor of predicting the structure of the genome is that there isn’t a single solution that we’re aiming for. There’s a distribution of structures, no matter what portion of the genome you’re looking at. Predicting that very complicated, high-dimensional statistical distribution is something that is incredibly challenging to do,” explained first author Greg Schuette.

Once trained, ChromoGen can generate predictions on a much faster timescale than experimental techniques like Hi-C. “Whereas you might spend six months running experiments to get a few dozen structures in a given cell type, you can generate a thousand structures in a particular region with our model in 20 minutes on just one GPU [graphics processing unit],” commented Schuette.

The researchers used the model to generate structure predictions for more than 2000 DNA sequences and compared them to the experimentally determined structures, finding that the predicted and experimentally determined structures were the same or very similar.

The model can make accurate chromatin structure predictions for other cell types, not just the one it was trained on, suggesting it could be used for exploring how chromatin structures differ between cell types and how those differences affect function. The model could also be used to investigate how DNA mutations affect chromatin structure and if this influences disease states.

The model is available on GitHub.