Just fold it: AI breakthrough into RNA structure prediction

New deep learning model predicts 3D structures of RNA with unseen accuracy

Semen Yesylevskyy
Receptor.AI
Published in
4 min readSep 4, 2021

--

Image from: https://www.news-medical.net/life-sciences/Single-Cell-RNA-Sequencing-(Portuguese).aspx

RNA is a unique polymer which is able to hold information it its nucleotide sequence and to fold into three-dimensional shapes, which is determined by this sequence. According to the widely accepted concept of the “RNA world” the life on Earth have started from the primordial soup of self-replicating RNA molecules. The proteins came to play much later to provide more efficient catalysis, but, in principle, RNA is able to sustain primitive life by itself.

Determining the 3D shape of RNA is notoriously complex. Although there are only four basic nucleotides, which are paired according to very simple complementarity rules, the variety of RNA shapes is huge. The problem of RNA folding is considered even more complex than protein folding.

The number of RNA sequences transcribed from DNA is ~30 times larger than the number of known proteins. Vast majority of these RNAs are never translated into proteins and perform functions, which are still poorly understood. The 3D structures are known for less than 1% of functional RNAs, which makes them the most enigmatic component of the cells’ machinery. Thus prediction of 3D structures of RNA is of great importance for modern life sciences.

Since the problem of protein 3D structure prediction was recently finally solved by the machine learning algorithms, it is tempting to use similar techniques for RNA folding. However, there are several obstacles for direct replication of the famous AlphaFold and similar algorithms for RNA.

  1. The number of known 3D structures of RNA is small.
  2. Evolutionary relations provides less information about tertiary contacts in RNAs.
  3. Characteristics of energetically favorable RNA structures are not sufficiently well understood.

In the recent paper published in Science some of these issues were finally addressed. The authors designed a neural network called the Atomic
Rotationally Equivariant Scorer (ARES), which assesses 3D models of RNA coordinates in terms of their root mean square deviation (RMSD) from the unknown “true” structure. ARES is a deep neural network consistsing of
many processing layers, where the output of the previous layers is fed as an input to the next one. The network has a specific architecture that enables it to learn directly from 3D structures and to learn effectively given a very small
amount of experimental data.

The model is “blind” in terms of assumptions about which features of a structural model should be learnt to achieve high accuracy. This makes it not at all specific to RNA and applicable to any molecular system.

The general idea of the model is somewhat similar to AlphaFold architecture. The initial layers of the network recognize local structural motifs, while the deeper layers work on the whole structure and predict the global properties and accuracy of the whole structure. The network is designed in such a way that rotations of the input coordinates lead to corresponding rotation of the output, making it rotationally invariant.

The ARES model was trained on just 18 RNA molecules with known 3D structure. For each sequence 1000 energy minimized structures were generated blindly without making use of true structure. Then the parameters of the neural network was optimized to give the best score for the structures, which are closest to the real one. A separate benchmark set of structures of various size and complexity was used to test the model performance.

The scheme of ARES neural network. Image from https://www.science.org/doi/abs/10.1126/science.abe5650.

The authors concluded that ARES substantially outperforms the other
methods of RNA structure prediction on both “easy” and “hard” sets of targets. Moreover, the network have learned essential structural features of RNA molecules without any a priory knowledge about them. For example, it learned about helical structures with optimal pairing of nucleotide bases, correctly deduced the hydrogen bonds and optimized the number of complementary base pairs, etc.

Such outstanding performance of the ARES deep neural network allows the authors to claim, that it could be used in the other areas as well. They anticipate that the method could be applied to molecular design of macromolecules (proteins and nucleic acids), classical small molecule drug design, estimation of properties of nanoparticles and prediction of mechanical properties of alloys and other materials

This paper demonstrates a huge potential of the deep learning techniques in various areas of life sciences including the drug design. The Receptor.AI utilizes the power of deep learning neural networks at several stages of our drug discovery workflow. We uses deep learning architectures in our state-of-the-art molecule generation module and in the structure based high throughput screening module.

--

--

Semen Yesylevskyy
Receptor.AI

PhD, Doctor of Sciences, researcher in the area of molecular dynamics and drug discovery. CSO of Receptor.AI. https://t.me/semen_yesylevskyy