Machine Learning for Quantum-Accurate Biomolecular Simulation

The holy grail of quantum chemistry/physics is ab initio simulation of biological molecules. I am using machine learning to directly address this challenge. I am developing a model that uses Euclidean Neural Networks to predict the electron density of molecular systems. This approach solves two key challenges of large-scale ab initio simulation.

1. The training data dilemma

All machine learning methods need training data. Quantum chemistry training calculations for a protein, however, are impossible on classical computers. To solve this, a machine learning (ML) approach will have to train on much smaller samples. This poses an existential conundrum for most ML quantum chemistry models: long range forces are known to be important in biomolecules, but these forces cannot be included in the training set.

To solve this problem my model learns the electron density. It then uses the Hellmann-Feynman Theorem to compute force directly from that density. In this way we are able to train on small calculations and extrapolate accurately to larger systems. In the approach we can get global, long-range forces from short-range densities.

2. The 3D Space Problem

The electron density is an object in 3D space. This means that it does not change upon translation or rotation of the inputs. Most standard ML methods are not well suited for this learning task.

To solve this problem we use a new class of machine learning framework called Euclidean Neural Networks, or e3nn (https://github.com/e3nn/e3nn). This model encodes the symmetries of 3D space naturally, using a convolutional graph neural network with spherical harmonic features. This allows us to learn the quantum electron density with orders of magnitude less training data than comparable models.

Josh Rackers

Joshua A. Rackers

Machine Learning for Quantum-Accurate Biomolecular Simulation

1. The training data dilemma

2. The 3D Space Problem