The application of deep learning ideas to the fields of computer vision, speach recognition and medical imaging saw a huge success. However, the data in these fields fits the differentiable transformations used in DL rather well. This is however, no the case of molecular data. This project is focused on development of the protein folding algorithm that uses recent ideas in DL. During the span of the project we are constructing the preprocessed datasets and library of differentiable transformations of protein conformation descriptors to lower the entry barrier for ML researchers.

The library contains such differentiable layers with PyTorch interface as:

  • Conversion from internal coordinates to the cartesian coordinates
  • Conversion from cartesian coordinates to the contact maps and distance maps
  • Conversion from cartesian coordinates to the volumetric atomic densities and atomic distance distributions
  • PDB loading and atomic types assignment
  • and other layers
  • The git repository: MILA:ProteinClassesLibrary

    The datasets:

  • Quality assessment CASP training dataset download
  • Quality assessment CASP11 Stage1 test dataset download
  • Quality assessment CASP11 Stage2 test dataset download
  • Quality assessment CASP12 test dataset download
  • Quality assessment CAMEO6 test dataset download
  • 3D CNN for ranking protein structures

    Ranking candidate structures for a given protein according to the closeness to the unknown experimental structure. This project provides the proof of concept, that we can learn to rank decoys relying only on the 3D atomic densities.

    Team:

  • Georgy Derevyanko
  • Supervised by:
  • Guillaume Lamoureux
  • Sergei Grudinin
  • Yoshua Bengio
  • Status: Manuscript submitted

    Models: Trained models and results

    Electrostatics in quality assessment

    Physical and chemical properties of the environment play the key role in protein folding. However, up till now the methods that select candidate structures do not take the electrostatics directly into account.

    Team:

  • Georgy Derevyanko
  • Justin Whatley
  • Supervised by:
  • Georgy Derevyanko
  • Guillaume Lamoureux
  • Status: Started

    Equivariance and extremum points

    The rotated and translated structure of a protein is still the same structure. This project tries to implement equivariant deep learning architectures to rank decoys. Also, it is well known that the native structures of many proteins correspond to the minima of free energy. This project attempts to include the penalties on the first and second derivatives of the model output, projected on the small random changes in the protein internal coordinates.

    Team:

  • Georgy Derevyanko
  • Status: Concept

    Differentiable C-alpha protein model

    This project implents differentiable transition between internal angles, coordinates and various descriptors of the protein conformations. This allows learning direct functional correspondance between sequence and coordinates of the C-alpha atoms of proteins using deep leaning methods. Such a correspondance lets us leverage the abundance of DNA sequencing data to train the model.

    Team:

  • Georgy Derevyanko
  • Stanislaw Jastrzebski
  • Supervised by:
  • Guillaume Lamoureux
  • Yoshua Bengio
  • Status: Preliminary results

    Reinforcement learning to sample conformations

    This project attempts to use deep reinforcement learning to optimize the full-atom representation of a protein. The reward function uses the results of the quality assessment sub-projects.

    Team: None

    Status: None

    Folding as a diffusion in the conformational space

    This project attempts to apply new unsupervised deep learning methods to learn diffusion kernel of the protein folding process.

    Team: None

    Status: None

    Protein structure prediction is a GAN

    It is clear that the workflow of the protein structure prediction breaks into a generator(conformation sampling) and discriminator parts (quality assessment). Previous algorithms do not propagate information from quality assessment algorithms to the generators. This project aims to train the end-to-end model of such workflows.

    Team: None

    Status: None

    The next CASP competition will start at the end of May 2018