This module is used to create Convolutional AutoEncoders for Variational Data Assimilation

Overview

VarDACAE

This module is used to create Convolutional AutoEncoders for Variational Data Assimilation. A user can define, create and train an AE for Data Assimilation with just a few lines of code. It is the accompanying code to the paper here, published in Computer Methods in Applied Mechanics and Engineering.

Introduction

Data Assimilation (DA) is an uncertainty quantification technique used to reduce the error in predictions by combining forecasting data with observation of the state. The most common techniques for DA are Variational approaches and Kalman Filters.

In this work, we propose a method of using Autoencoders to model the Background error covariance matrix, to greatly reduce the computational cost of solving 3D Variational DA while increasing the quality of the Data Assimilation.

Data

The data used in this paper is owned by the Data Science Institute, Imperial College, London. If you do not have access to this data, please see the section below on training a model with your own data.

Installation

  1. Install vtk by navigating to this link and installing the version applicable to your system.

  2. Navigate to the base directory and run:

    pip install -e .
  3. Run pytest from the home directory to ensure correct installation.

Tests

From the project home directory run pytest.

Getting Started

To train and evaluate a Tucodec model on Fluidity data:

from VarDACAE import TrainAE, BatchDA
from VarDACAE.settings.models.CLIC import CLIC

model_kwargs = {"model_name": "Tucodec", "block_type": "NeXt", "Cstd": 64}

settings = CLIC(**model_kwargs)    # settings describing experimental setup
expdir = "experiments/expt1/"      # dir to save results data and models

trainer = TrainAE(settings, expdir, batch_sz=16)
model = trainer.train(num_epochs=150)   # this will take approximately 8 hrs on a K80

# evaluate DA on the test set:
results_df = BatchDA(settings, AEModel=model).run()

Settings Instance

The API is based around a monolithic settings object that is used to define all configuration parameters, from the model definition to the seed. This single point of truth is used so that, an experiment can be repeated exactly by simply loading a pickled settings object. All key classes like TrainAE and BatchDA require a settings object at initialisation.

Train a model on your own data

To train a model on your own 3D data you must do the following:

  • Override the default get_X(...) method in the GetData loader class:
from VarDACAE import GetData

class NewLoaderClass(GetData):
    def get_X(self, settings):
        "Arguments:
               settings: (A settings.Config class)
        returns:
            np.array of dimensions B x nx x ny x nz "

        # ... calculate / load or download X
        # For an example see VarDACAE.data.load.GetData.get_X"""
        return X
  • Create a new settings class that inherits from your desired model's settings class (e.g. VarDACAE.settings.models.CLIC.CLIC) and update the data dimensions:
from VarDACAE.settings.models.CLIC import CLIC

class NewConfig(CLIC):
    def __init__(self, CLIC_kwargs, opt_kwargs):
        super(CLIC, self).__init__(**CLIC_kwargs)
        self.n3d = (100, 200, 300)  # Define input domain size
                                    # This is used by ConvScheduler
        self.X_FP = "SET_IF_REQ_BY_get_X"
        # ... use opt_kwargs as desired

CLIC_kwargs =  {"model_name": "Tucodec", "block_type": "NeXt",
                "Cstd": 64, "loader": NewLoaderClass}
                # NOTE: do not initialize NewLoaderClass

settings = NewConfig(CLIC_kwargs, opt_kwargs)

This settings object can now be used to train a model with the TrainAE method as shown above.

Owner
Julian Mack
Data Scientist at Accelex
Julian Mack
A computer algebra system written in pure Python

SymPy See the AUTHORS file for the list of authors. And many more people helped on the SymPy mailing list, reported bugs, helped organize SymPy's part

SymPy 9.9k Dec 31, 2022
A tax calculator for stocks and dividends activities.

Revolut Stocks calculator for Bulgarian National Revenue Agency Information Processing and calculating the required information about stock possession

Doino Gretchenliev 200 Oct 25, 2022
MDAnalysis is a Python library to analyze molecular dynamics simulations.

MDAnalysis Repository README [*] MDAnalysis is a Python library for the analysis of computer simulations of many-body systems at the molecular scale,

MDAnalysis 933 Dec 28, 2022
Collections of pydantic models

pydantic-collections The pydantic-collections package provides BaseCollectionModel class that allows you to manipulate collections of pydantic models

Roman Snegirev 20 Dec 26, 2022
Analyzing Covid-19 Outbreaks in Ontario

My group and I took Covid-19 outbreak statistics from ontario, and analyzed them to find different patterns and future predictions for the virus

Vishwaajeeth Kamalakkannan 0 Jan 20, 2022
Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.

Stock Statistics/Indicators Calculation Helper VERSION: 0.3.2 Introduction Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline s

Cedric Zhuang 1.1k Dec 28, 2022
Instant search for and access to many datasets in Pyspark.

SparkDataset Provides instant access to many datasets right from Pyspark (in Spark DataFrame structure). Drop a star if you like the project. 😃 Motiv

Souvik Pratiher 31 Dec 16, 2022
A simplified prototype for an as-built tracking database with API

Asbuilt_Trax A simplified prototype for an as-built tracking database with API The purpose of this project is to: Model a database that tracks constru

Ryan Pemberton 1 Jan 31, 2022
ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.

ForecastGA is a tool that combines a couple of popular libraries, Atspy and googleanalytics, with a few enhancements.

JR Oakes 36 Jan 03, 2023
A set of tools to analyse the output from TraDIS analyses

QuaTradis (Quadram TraDis) A set of tools to analyse the output from TraDIS analyses Contents Introduction Installation Required dependencies Bioconda

Quadram Institute Bioscience 2 Feb 16, 2022
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

10k Jan 01, 2023
A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is

6 Sep 07, 2022
Wafer Fault Detection - Wafer circleci with python

Wafer Fault Detection Problem Statement: Wafer (In electronics), also called a slice or substrate, is a thin slice of semiconductor, such as a crystal

Avnish Yadav 14 Nov 21, 2022
Falcon: Interactive Visual Analysis for Big Data

Falcon: Interactive Visual Analysis for Big Data Crossfilter millions of records without latencies. This project is work in progress and not documente

Vega 803 Dec 27, 2022
Gathering data of likes on Tinder within the past 7 days

tinder_likes_data Gathering data of Likes Sent on Tinder within the past 7 days. Versions November 25th, 2021 - Functionality to get the name and age

Alex Carter 12 Jan 05, 2023
Describing statistical models in Python using symbolic formulas

Patsy is a Python library for describing statistical models (especially linear models, or models that have a linear component) and building design mat

Python for Data 866 Dec 16, 2022
Shot notebooks resuming the main functions of GeoPandas

Shot notebooks resuming the main functions of GeoPandas, 2 notebooks written as Exercises to apply these functions.

1 Jan 12, 2022
A notebook to analyze Amazon Recommendation Review Dataset.

Amazon Recommendation Review Dataset Analyzer A notebook to analyze Amazon Recommendation Review Dataset. Features Calculates distinct user count, dis

isleki 3 Aug 22, 2022
songplays datamart provide details about the musical taste of our customers and can help us to improve our recomendation system

Songplays User activity datamart The following document describes the model used to build the songplays datamart table and the respective ETL process.

Leandro Kellermann de Oliveira 1 Jul 13, 2021
A set of procedures that can realize covid19 virus detection based on blood.

A set of procedures that can realize covid19 virus detection based on blood.

Nuyoah-xlh 3 Mar 07, 2022