MSc. Thesis Presentation - Nikhil Shenoy

Date

Name: Nikhil Shenoy

Date: Thursday, June 27, 2024

Time: 9:00am to 11:00am

Location: ICCS 204

Zoom: https://us04web.zoom.us/j/78104746757?pwd=yQwbhtdMLC0gwEWNAUUVTsNYeFgLkq.1

Title of Thesis: Investigating ML Potentials and Deep Generative Models for Efficient Conformational Sampling

Abstract:

Efficiently sampling the landscape of molecular conformations is an important task in computational drug discovery. Simulation approaches like Molecular Dynamics (MD) require the use of an energy function that are fast, accurate, transferable and scalable. Traditional approaches like Force Fields are fast, but in-accurate while Quantum Mechanical

(QM) Methods are accurate but not scalable. Recently, Machine Learning

(ML) potentials trained on datasets labeled with QM methods have been proposed as a solution. However, generating QM datasets is a cost-intensive exercise, and design choices like conformational and structural diversity during the generation process can introduce biases into the data.

In the first part of the thesis, we explore the intricate relationship between dataset biases, specifically conformational and structural diversity, and ML potential generalization. We investigate these dynamics through two distinct experiments: a fixed budget one, where the dataset size remains constant, and a fixed molecular set one, which focuses on fixed structural diversity while varying conformational diversity. Our results reveal the critical need for balanced structural and conformational diversity in QM datasets for optimal generalization, which current datasets lack. We believe these findings can inform future data generation and development of ML potentials that generalize beyond training data.

An alternative approach is directly sampling conformations given the molecular graph using deep generative models like Diffusion. Existing competitive approaches either use expensive local structure methods or rely on large architectures without inductive biases. In the second part of the thesis, we challenge the status quo and develop a simple and scalable deep generative method, Equivariant Transformer Flow (ET-Flow) incorporating flow-matching, an SO(3) equivariant transformer, a harmonic prior, and an approximate Optimal Transport alignment. We achieve state-of-the-art performance on several molecular conformer generation benchmarks with significantly fewer parameters and inference steps than existing methods highlighting the importance of inductive biases and well informed modelling choices.