Twin-deep neural network for semi-supervised learning of materials properties

Related tags

Deep Learningtsdnn
Overview

Deep Semi-Supervised Teacher-Student Material Synthesizability Prediction

Citation:

Semi-supervised teacher-student deep neural network for materials discovery” by Daniel Gleaves, Edirisuriya M. Dilanga Siriwardane,Yong Zhao, and JianjunHu.

Machine learning and evolution laboratory

Department of Computer Science and Engineering

University of South Carolina


This software package implements the Meta Pseudo Labels (MPL) semi-supervised learning method with Crystal Graph Convolutional Neural Networks (CGCNN) with that takes an arbitary crystal structure to predict material synthesizability and whether it has positive or negative formation energy

The package provides two major functions:

  • Train a semi-supervised TSDNN classification model with a customized dataset.
  • Predict material synthesizability and formation energy of new crystals with a pre-trained TSDNN model.

The following paper describes the details of the CGCNN architecture, a graph neural network model for materials property prediction: CGCNN paper

The following paper describes the details of the semi-supervised learning framework that we used in our model: Meta Pseudo Labels

Table of Contents

Prerequisites

This package requires:

If you are new to Python, the easiest way of installing the prerequisites is via conda. After installing conda, run the following command to create a new environment named cgcnn and install all prerequisites:

conda upgrade conda
conda create -n tsdnn python=3 scikit-learn pytorch torchvision pymatgen -c pytorch -c conda-forge

*Note: this code is tested for PyTorch v1.0.0+ and is not compatible with versions below v0.4.0 due to some breaking changes.

This creates a conda environment for running TSDNN. Before using TSDNN, activate the environment by:

conda activate tsdnn

Usage

Define a customized dataset

To input crystal structures to TSDNN, you will need to define a customized dataset. Note that this is required for both training and predicting.

Before defining a customized dataset, you will need:

  • CIF files recording the structure of the crystals that you are interested in
  • The target label for each crystal (not needed for predicting, but you need to put some random numbers in data_test.csv)

You can create a customized dataset by creating a directory root_dir with the following files:

  1. data_labeled.csv: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column recodes the known value of the target label.

  2. data_unlabeled.csv: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column can be filled with alternating 1 and 0 (the second column is still needed).

  3. atom_init.json: a JSON file that stores the initialization vector for each element. An example of atom_init.json is data/sample-regression/atom_init.json, which should be good for most applications.

  4. ID.cif: a CIF file that recodes the crystal structure, where ID is the unique ID for the crystal.

(4.) data_predict: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column can be filled with alternating 1 and 0 (the second column is still needed). This is the file that will be used if you want to classify materials with predict.py.

The structure of the root_dir should be:

root_dir
├── data_labeled.csv
├── data_unlabeled.csv
├── data_test.csv
├── data_positive.csv (optional- for positive and unlabeled dataset generation)
├── data_unlabeled_full.csv (optional- for positive and unlabeled dataset generation, data_unlabeled.csv will be overwritten)
├── atom_init.json
├── id0.cif
├── id1.cif
├── ...

There is an example of customized a dataset in: data/example.

Train a TSDNN model

Before training a new TSDNN model, you will need to:

Then, in directory synth-tsdnn, you can train a TSDNN model for your customized dataset by:

python main.py root_dir

If you want to use the PU learning dataset generation, you can train a model using the --uds flag with the number of PU iterations to perform.

python main.py --uds 5 root_dir

You can set the number of training, validation, and test data with labels --train-size, --val-size, and --test-size. Alternatively, you may use the flags --train-ratio, --val-ratio, --test-ratio instead. Note that the ratio flags cannot be used with the size flags simultaneously. For instance, data/example has 10 data points in total. You can train a model by:

python main.py --train-size 6 --val-size 2 --test-size 2 data/example

or alternatively

python main.py --train-ratio 0.6 --val-ratio 0.2 --test-ratio 0.2 data/example

After training, you will get 5 files in synth-tsdnn directory.

  • checkpoints/teacher_best.pth.tar: stores the TSDNN teacher model with the best validation accuracy.
  • checkpoints/student_best.pth.tar" stores the TSDNN student model with the best validation accuracy.
  • checkpoints/t_checkpoint.pth.tar: stores the TSDNN teacher model at the last epoch.
  • checkpoints/s_checkpoint.pth.tar: stores the TSDNN student model at the last epoch.
  • results/validation/test_results.csv: stores the ID and predicted value for each crystal in training set.

Predict material properties with a pre-trained TSDNN model

Before predicting the material properties, you will need to:

  • Define a customized dataset at root_dir for all the crystal structures that you want to predict.
  • Obtain a pre-trained TSDNN model (example found in checkpoints/pre-trained/pre-train.pth.tar).

Then, in directory synth-tsdnn, you can predict the properties of the crystals in root_dir:

python predict.py checkpoints/pre-trained/pre-trained.pth.tar data/root_dir

After predicting, you will get one file in synth-tsdnn directory:

  • predictions.csv: stores the ID and predicted value for each crystal in test set.

Data

To reproduce our paper, you can download the corresponding datasets following the instruction. Each dataset discussed can be found in data/datasets/

Authors

This software was primarily written by Daniel Gleaves who was advised by Prof. Jianjun Hu. This software builds upon work by Tian Xie, Hieu Pham, and Jungdae Kim.

Acknowledgements

Research reported in this work was supported in part by NSF under grants 1940099 and 1905775. The views, perspective,and content do not necessarily represent the official views of NSF. This work was supported in part by the South Carolina Honors College Research Program. This work is partially supported by a grant from the University of South Carolina Magellan Scholar Program.

License

TSDNN is released under the MIT License.

Owner
MLEG
MLEG
A collection of random and hastily hacked together scripts for investigating EU-DCC

A collection of random and hastily hacked together scripts for investigating EU-DCC

Ryan Barrett 8 Mar 01, 2022
Normal Learning in Videos with Attention Prototype Network

Codes_APN Official codes of CVPR21 paper: Normal Learning in Videos with Attention Prototype Network (https://arxiv.org/abs/2108.11055) Overview of ou

11 Dec 13, 2022
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
Training data extraction on GPT-2

Training data extraction from GPT-2 This repository contains code for extracting training data from GPT-2, following the approach outlined in the foll

Florian Tramer 62 Dec 07, 2022
A collection of resources and papers on Diffusion Models, a darkhorse in the field of Generative Models

This repository contains a collection of resources and papers on Diffusion Models and Score-based Models. If there are any missing valuable resources

5.1k Jan 08, 2023
Implementation of the algorithm shown in the article "Modelo de Predicción de Éxito de Canciones Basado en Descriptores de Audio"

Success Predictor Implementation of the algorithm shown in the article "Modelo de Predicción de Éxito de Canciones Basado en Descriptores de Audio". B

Rodrigo Nazar Meier 4 Mar 17, 2022
Self-supervised learning (SSL) is a method of machine learning

Self-supervised learning (SSL) is a method of machine learning. It learns from unlabeled sample data. It can be regarded as an intermediate form between supervised and unsupervised learning.

Ashish Patel 4 May 26, 2022
Codebase for Amodal Segmentation through Out-of-Task andOut-of-Distribution Generalization with a Bayesian Model

Codebase for Amodal Segmentation through Out-of-Task andOut-of-Distribution Generalization with a Bayesian Model

Yihong Sun 12 Nov 15, 2022
Code for paper 'Hand-Object Contact Consistency Reasoning for Human Grasps Generation' at ICCV 2021

GraspTTA Hand-Object Contact Consistency Reasoning for Human Grasps Generation (ICCV 2021). Project Page with Videos Demo Quick Results Visualization

Hanwen Jiang 47 Dec 09, 2022
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

101 Nov 25, 2022
"SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image", Dejia Xu, Yifan Jiang, Peihao Wang, Zhiwen Fan, Humphrey Shi, Zhangyang Wang

SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image [Paper] [Website] Pipeline Code Environment pip install -r requirements

VITA 250 Jan 05, 2023
Code release of paper "Deep Multi-View Stereo gone wild"

Deep MVS gone wild Pytorch implementation of "Deep MVS gone wild" (Paper | website) This repository provides the code to reproduce the experiments of

François Darmon 53 Dec 24, 2022
Neuron class provides LNU (Linear Neural Unit), QNU (Quadratic Neural Unit), RBF (Radial Basis Function), MLP (Multi Layer Perceptron), MLP-ELM (Multi Layer Perceptron - Extreme Learning Machine) neurons learned with Gradient descent or LeLevenberg–Marquardt algorithm

Neuron class provides LNU (Linear Neural Unit), QNU (Quadratic Neural Unit), RBF (Radial Basis Function), MLP (Multi Layer Perceptron), MLP-ELM (Multi Layer Perceptron - Extreme Learning Machine) neu

Filip Molcik 38 Dec 17, 2022
PyTorch implementation of Tacotron speech synthesis model.

tacotron_pytorch PyTorch implementation of Tacotron speech synthesis model. Inspired from keithito/tacotron. Currently not as much good speech quality

Ryuichi Yamamoto 279 Dec 09, 2022
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set —— PyTorch implementation This is an unofficial offici

Sicheng Xu 833 Dec 28, 2022
Aquarius - Enabling Fast, Scalable, Data-Driven Virtual Network Functions

Aquarius Aquarius - Enabling Fast, Scalable, Data-Driven Virtual Network Functions NOTE: We are currently going through the open-source process requir

Zhiyuan YAO 0 Jun 02, 2022
PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Non-Autoregressive Transformer Code release for Non-Autoregressive Neural Machine Translation by Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K.

Salesforce 261 Nov 12, 2022
Code, final versions, and information on the Sparkfun Graphical Datasheets

Graphical Datasheets Code, final versions, and information on the SparkFun Graphical Datasheets. Generated Cells After Running Script Example Complete

SparkFun Electronics 102 Jan 05, 2023
A PyTorch Implementation of FaceBoxes

FaceBoxes in PyTorch By Zisian Wong, Shifeng Zhang A PyTorch implementation of FaceBoxes: A CPU Real-time Face Detector with High Accuracy. The offici

Zi Sian Wong 797 Dec 17, 2022