Publication describing 3 ML examples at NSLS-II and interfacing into Bluesky

Overview

Machine learning enabling high-throughput and remote operations at large-scale user facilities.

Overview

This repository contains the source code and examples for recreating the publication at arXiv:2201.03550.

Abstract

Imaging, scattering, and spectroscopy are fundamental in understanding and discovering new functional materials. Contemporary innovations in automation and experimental techniques have led to these measurements being performed much faster and with higher resolution, thus producing vast amounts of data for analysis. These innovations are particularly pronounced at user facilities and synchrotron light sources. Machine learning (ML) methods are regularly developed to process and interpret large datasets in real-time with measurements. However, there remain conceptual barriers to entry for the facility general user community, whom often lack expertise in ML, and technical barriers for deploying ML models. Herein, we demonstrate a variety of archetypal ML models for on-the-fly analysis at multiple beamlines at the National Synchrotron Light Source II (NSLS-II). We describe these examples instructively, with a focus on integrating the models into existing experimental workflows, such that the reader can easily include their own ML techniques into experiments at NSLS-II or facilities with a common infrastructure. The framework presented here shows how with little effort, diverse ML models operate in conjunction with feedback loops via integration into the existing Bluesky Suite for experimental orchestration and data management.

Explanation of Examples

As with all things at a user facility, each model is trained or set-up according to the needs of the user and their science. What is consistent across all AI agents, is their final communication paradigm. The agent loads and stores the model and/or necessary data, and has at minimum the following methods.

  • tell : tell the agent about some new data
  • report : construct a report (message, visualization, etc.) about the data
  • ask : ask the agent what to do next (for more see bluesky-adaptive)

Unsupervised learning (Non-negative matrix factorization)

The NMF companion agent keeps a constant cache of data to perform the reduction on. We treat these data as dependent variables, with independent variables coming fom the experiment. In the case study presented, the independent variables are temperature measurements, and the dependent variables are the 1-d spectra. Each call to report updates the decomposition using the full dataset, and updates the plots in the visualization.

The NMF companion agent is wrapped in a filesystem watcher, DirectoryAgent, which monitors a directory periodically. If there is new data in the target directory, the DirectoryAgent tells the NMF companion about the new data, and triggers a new report.

The construction of these objects, training, and visualization are all contained in the run_unsupervised file and mirrored in the corresponding notebook.

Anomaly detection

The model attributes a new observation to either normal or anomalous time series by comparing it to a large courpus of data collected at the beamline over an extended period of time. The development and updating of the model is done offline. Due to the nature of exparimental measurements, anomalous observatons may constitute a sizable portion of data withing a single collection period. Thus, a labeling of the data is required prior to model training. Once the model is trained it is saved as a binary file and loaded each time when AnomalyAgent is initialized.

A set of features devired from the original raw data, allowing the model to process time series of arbitary length.

The training can be found at run_anomaly.py with example deployment infrastructure at deploy_anomaly.py.

Supervised learning (Failure Classification)

The classifications of failures involves training the models entirely offline. This allows for robust model selection and specific deployment. A suite of models from scikit-learn are trained and tested, with the most promising model chosen to deploy. Since the models are lightweight, we re-train them at each instantiation during deployment with the most current dataset. For deep learning models, it would be appropriate to save and version the weights of a model, can construct the model at instantiation and load the weights.

The training can be found at run_supervised.py with example deployment infrastructure at deploy_supervised.py. How this is implemented at the BMM beamline can be found concisely here, where a wrapper agent does pointwise evaluation on UIDs of a document stream, using the ClassificationAgent's tell--report interface.

System Requirements

Hardware Requirements

Software Requirements

OS Requirements

This package has been tested exclusively on Linux operating systems.

  • RHEL 8.3
  • Ubuntu 18.04
  • PopOS 20.04

Python dependencies

  • numpy
  • matplotlib
  • scikit-learn
  • ipython

Getting Started

Installation guide

Install from github:

$ python3 -m venv pub_env
$ source pub_env/bin/activate
Owner
BNL
Brookhaven National Laboratory
BNL
A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

Larger Google Sat2Map dataset This dataset extends the aerial ⟷ Maps dataset used in pix2pix (Isola et al., CVPR17). The provide script download_sat2m

34 Dec 28, 2022
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

Tadas Baltrusaitis 5.8k Dec 31, 2022
Sum-Product Probabilistic Language

Sum-Product Probabilistic Language SPPL is a probabilistic programming language that delivers exact solutions to a broad range of probabilistic infere

MIT Probabilistic Computing Project 57 Nov 17, 2022
🚀 PyTorch Implementation of "Progressive Distillation for Fast Sampling of Diffusion Models(v-diffusion)"

PyTorch Implementation of "Progressive Distillation for Fast Sampling of Diffusion Models(v-diffusion)" Unofficial PyTorch Implementation of Progressi

Vitaliy Hramchenko 58 Dec 19, 2022
Tiny-NewsRec: Efficient and Effective PLM-based News Recommendation

Tiny-NewsRec The source codes for our paper "Tiny-NewsRec: Efficient and Effective PLM-based News Recommendation". Requirements PyTorch == 1.6.0 Tensor

Yang Yu 3 Dec 07, 2022
Implement object segmentation on images using HOG algorithm proposed in CVPR 2005

HOG Algorithm Implementation Description HOG (Histograms of Oriented Gradients) Algorithm is an algorithm aiming to realize object segmentation (edge

Leo Hsieh 2 Mar 12, 2022
clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation

README clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation CVPR 2021 Authors: Suprosanna Shit and Johannes C. Paetzo

110 Dec 29, 2022
Company clustering with K-means/GMM and visualization with PCA, t-SNE, using SSAN relation extraction

RE results graph visualization and company clustering Installation pip install -r requirements.txt python -m nltk.downloader stopwords python3.7 main.

Jieun Han 1 Oct 06, 2022
Contour-guided image completion with perceptual grouping (BMVC 2021 publication)

Contour-guided Image Completion with Perceptual Grouping Authors Morteza Rezanejad*, Sidharth Gupta*, Chandra Gummaluru, Ryan Marten, John Wilder, Mic

Sid Gupta 6 Dec 27, 2022
Mememoji - A facial expression classification system that recognizes 6 basic emotions: happy, sad, surprise, fear, anger and neutral.

a project built with deep convolutional neural network and ❤️ Table of Contents Motivation The Database The Model 3.1 Input Layer 3.2 Convolutional La

Jostine Ho 761 Dec 05, 2022
HyperaPy: An automatic hyperparameter optimization framework ⚡🚀

hyperpy HyperPy: An automatic hyperparameter optimization framework Description HyperPy: Library for automatic hyperparameter optimization. Build on t

Sergio Mora 7 Sep 06, 2022
This is a work in progress reimplementation of Instant Neural Graphics Primitives

Neural Hash Encoding This is a work in progress reimplementation of Instant Neural Graphics Primitives Currently this can train an implicit representa

Penn 79 Sep 01, 2022
Mercer Gaussian Process (MGP) and Fourier Gaussian Process (FGP) Regression

Mercer Gaussian Process (MGP) and Fourier Gaussian Process (FGP) Regression We provide the code used in our paper "How Good are Low-Rank Approximation

Aristeidis (Ares) Panos 0 Dec 13, 2021
Code for our paper A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization,

FSRA This repository contains the dataset link and the code for our paper A Transformer-Based Feature Segmentation and Region Alignment Method For UAV

Dmmm 32 Dec 18, 2022
The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"

Swin-Unet The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"(https://arxiv.org/abs/2105.05537). A validatio

869 Jan 07, 2023
Convnext-tf - Unofficial tensorflow keras implementation of ConvNeXt

ConvNeXt Tensorflow This is unofficial tensorflow keras implementation of ConvNe

29 Oct 06, 2022
Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.

English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained models

Clara Meister 50 Nov 12, 2022
Source code for "OmniPhotos: Casual 360° VR Photography"

OmniPhotos: Casual 360° VR Photography Project Page | Video | Paper | Demo | Data This repository contains the source code for creating and viewing Om

Christian Richardt 144 Dec 30, 2022
Generate images from texts. In Russian. In PaddlePaddle

ruDALL-E PaddlePaddle ruDALL-E in PaddlePaddle. Install: pip install rudalle_paddle==0.0.1rc1 Run with free v100 on AI Studio. Original Pytorch versi

AgentMaker 20 Oct 18, 2022
3rd Place Solution of the Traffic4Cast Core Challenge @ NeurIPS 2021

3rd Place Solution of Traffic4Cast 2021 Core Challenge This is the code for our solution to the NeurIPS 2021 Traffic4Cast Core Challenge. Paper Our so

7 Jul 25, 2022