An SE(3)-invariant autoencoder for generating the periodic structure of materials

Related tags

Deep Learningcdvae
Overview

Crystal Diffusion Variational AutoEncoder

This software implementes Crystal Diffusion Variational AutoEncoder (CDVAE), which generates the periodic structure of materials.

It has several main functionalities:

  • Generate novel, stable materials by learning from a dataset containing existing material structures.
  • Generate materials by optimizing a specific property in the latent space, i.e. inverse design.

[Paper] [Datasets]

Table of Contents

Installation

The easiest way to install prerequisites is via conda.

Pre-install step

Install conda-merge:

pip install conda-merge

Check that you can invoke conda-merge by running conda-merge -h.

GPU machines

Run the following command to install the environment:

conda-merge env.common.yml env.gpu.yml > env.yml
conda env create -f env.yml

Activate the conda environment with conda activate cdvae.

Install this package with pip install -e ..

CPU-only machines

conda-merge env.common.yml env.cpu.yml > env.yml
conda env create -f env.yml
conda activate cdvae
pip install -e .

Setting up environment variables

Make a copy of the .env.template file and rename it to .env. Modify the following environment variables in .env.

  • PROJECT_ROOT: path to the folder that contains this repo
  • HYDRA_JOBS: path to a folder to store hydra outputs
  • WABDB: path to a folder to store wabdb outputs

Datasets

All datasets are directly available on data/ with train/valication/test splits. You don't need to download them again. If you use these datasets, please consider to cite the original papers from which we curate these datasets.

Find more about these datasets by going to our Datasets page.

Training CDVAE

Training without a property predictor

To train a CDVAE, run the following command:

python cdvae/run.py data=perov expname=perov

To use other datasets, use data=carbon and data=mp_20 instead. CDVAE uses hydra to configure hyperparameters, and users can modify them with the command line or configure files in conf/ folder.

After training, model checkpoints can be found in $HYDRA_JOBS/singlerun/YYYY-MM-DD/expname.

Training with a property predictor

Users can also additionally train an MLP property predictor on the latent space, which is needed for the property optimization task:

python cdvae/run.py data=perov expname=perov model.predict_property=True

The name of the predicted propery is defined in data.prop, as in conf/data/perov.yaml for Perov-5.

Generating materials

To generate materials, run the following command:

python scripts/evaluate.py --model_path MODEL_PATH --tasks recon gen opt

MODEL_PATH will be the path to the trained model. Users can choose one or several of the 3 tasks:

  • recon: reconstruction, reconstructs all materials in the test data. Outputs can be found in eval_recon.ptl
  • gen: generate new material structures by sampling from the latent space. Outputs can be found in eval_gen.pt.
  • opt: generate new material strucutre by minimizing the trained property in the latent space (requires model.predict_property=True). Outputs can be found in eval_opt.pt.

eval_recon.pt, eval_gen.pt, eval_opt.pt are pytorch pickles files containing multiple tensors that describes the structures of M materials batched together. Each material can have different number of atoms, and we assume there are in total N atoms. num_evals denote the number of Langevin dynamics we perform for each material.

  • frac_coords: fractional coordinates of each atom, shape (num_evals, N, 3)
  • atom_types: atomic number of each atom, shape (num_evals, N)
  • lengths: the lengths of the lattice, shape (num_evals, M, 3)
  • angles: the angles of the lattice, shape (num_evals, M, 3)
  • num_atoms: the number of atoms in each material, shape (num_evals, M)

Evaluating model

To compute evaluation metrics, run the following command:

python scripts/compute_metrics.py --root_path MODEL_PATH --tasks recon gen opt

MODEL_PATH will be the path to the trained model. All evaluation metrics will be saved in eval_metrics.json.

Authors and acknowledgements

The software is primary written by Tian Xie, with signficant contributions from Xiang Fu.

The GNN codebase and many utility functions are adapted from the ocp-models by the Open Catalyst Project. Especially, the GNN implementations of DimeNet++ and GemNet are used.

The main structure of the codebase is built from NN Template.

For the datasets, Perov-5 is curated from Perovksite water-splitting, Carbon-24 is curated from AIRSS data for carbon at 10GPa, MP-20 is curated from Materials Project.

Citation

Please consider citing the following paper if you find our code & data useful.

@article{xie2021crystal,
  title={Crystal Diffusion Variational Autoencoder for Periodic Material Generation},
  author={Xie, Tian and Fu, Xiang and Ganea, Octavian-Eugen and Barzilay, Regina and Jaakkola, Tommi},
  journal={arXiv preprint arXiv:2110.06197},
  year={2021}
}

Contact

Please leave an issue or reach out to Tian Xie (txie AT csail DOT mit DOT edu) if you have any questions.

Owner
Tian Xie
Postdoc at MIT CSAIL. Machine learning algorithms for materials, drugs, and beyond.
Tian Xie
SGoLAM - Simultaneous Goal Localization and Mapping

SGoLAM - Simultaneous Goal Localization and Mapping PyTorch implementation of the MultiON runner-up entry, SGoLAM: Simultaneous Goal Localization and

10 Jan 05, 2023
Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

Jonas Köhler 893 Dec 28, 2022
Official implementation of the Implicit Behavioral Cloning (IBC) algorithm

Implicit Behavioral Cloning This codebase contains the official implementation of the Implicit Behavioral Cloning (IBC) algorithm from our paper: Impl

Google Research 210 Dec 09, 2022
Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.

FENSE The metric, Fluency ENhanced Sentence-bert Evaluation (FENSE), for audio caption evaluation, proposed in the paper "Can Audio Captions Be Evalua

Zhiling Zhang 13 Dec 23, 2022
PAIRED in PyTorch 🔥

PAIRED This codebase provides a PyTorch implementation of Protagonist Antagonist Induced Regret Environment Design (PAIRED), which was first introduce

UCL DARK Lab 46 Dec 12, 2022
Generative Art Using Neural Visual Grammars and Dual Encoders

Generative Art Using Neural Visual Grammars and Dual Encoders Arnheim 1 The original algorithm from the paper Generative Art Using Neural Visual Gramm

DeepMind 231 Jan 05, 2023
A Protein-RNA Interface Predictor Based on Semantics of Sequences

PRIP PRIP:A Protein-RNA Interface Predictor Based on Semantics of Sequences installation gensim==3.8.3 matplotlib==3.1.3 xgboost==1.3.3 prettytable==2

李优 0 Mar 25, 2022
OpenL3: Open-source deep audio and image embeddings

OpenL3 OpenL3 is an open-source Python library for computing deep audio and image embeddings. Please refer to the documentation for detailed instructi

Music and Audio Research Laboratory - NYU 326 Jan 02, 2023
[CVPR 2021] NormalFusion: Real-Time Acquisition of Surface Normals for High-Resolution RGB-D Scanning

NormalFusion: Real-Time Acquisition of Surface Normals for High-Resolution RGB-D Scanning Project Page | Paper | Supplemental material #1 | Supplement

KAIST VCLAB 49 Nov 24, 2022
Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion

CSF Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion Tips: For testing: CUDA_VISIBLE_DEVICES=0 python main.py For trai

Han Xu 14 Oct 31, 2022
Python code to fuse multiple RGB-D images into a TSDF voxel volume.

Volumetric TSDF Fusion of RGB-D Images in Python This is a lightweight python script that fuses multiple registered color and depth images into a proj

Andy Zeng 845 Jan 03, 2023
source code of Adversarial Feedback Loop Paper

Adversarial Feedback Loop [ArXiv] [project page] Official repository of Adversarial Feedback Loop paper Firas Shama, Roey Mechrez, Alon Shoshan, Lihi

17 Jul 20, 2022
Developed an optimized algorithm which finds the most optimal path between 2 points in a 3D Maze using various AI search techniques like BFS, DFS, UCS, Greedy BFS and A*

Developed an optimized algorithm which finds the most optimal path between 2 points in a 3D Maze using various AI search techniques like BFS, DFS, UCS, Greedy BFS and A*. The algorithm was extremely

1 Mar 28, 2022
Library for 8-bit optimizers and quantization routines.

bitsandbytes Bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers and quantization functions. Paper -- V

Facebook Research 687 Jan 04, 2023
Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency

Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency This is a official implementation of the CycleContrast introduced in

13 Nov 14, 2022
YoloAll is a collection of yolo all versions. you you use YoloAll to test yolov3/yolov5/yolox/yolo_fastest

官方讨论群 QQ群:552703875 微信群:15158106211(先加作者微信,再邀请入群) YoloAll项目简介 YoloAll是一个将当前主流Yolo版本集成到同一个UI界面下的推理预测工具。可以迅速切换不同的yolo版本,并且可以针对图片,视频,摄像头码流进行实时推理,可以很方便,直观

DL-Practise 244 Jan 01, 2023
Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch

Lie Transformer - Pytorch (wip) Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch. Only the SE3 version will be present in thi

Phil Wang 78 Oct 26, 2022
TensorFlow (Python) implementation of DeepTCN model for multivariate time series forecasting.

DeepTCN TensorFlow TensorFlow (Python) implementation of multivariate time series forecasting model introduced in Chen, Y., Kang, Y., Chen, Y., & Wang

Flavia Giammarino 21 Dec 19, 2022
MediaPipe is a an open-source framework from Google for building multimodal

MediaPipe is a an open-source framework from Google for building multimodal (eg. video, audio, any time series data), cross platform (i.e Android, iOS, web, edge devices) applied ML pipelines. It is

Bhavishya Pandit 3 Sep 30, 2022
A Python package for generating concise, high-quality summaries of a probability distribution

GoodPoints A Python package for generating concise, high-quality summaries of a probability distribution GoodPoints is a collection of tools for compr

Microsoft 28 Oct 10, 2022