Big Data & Cloud Computing for Oceanography

Edward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilis

4.7k Jan 09, 2023

Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences

Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences. Copula and functional Principle Component Analysis (fPCA) are st

32 Dec 20, 2022

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

1.8k Jan 09, 2023

Pizza Orders Data Pipeline Usecase Solved by SQL, Sqoop, HDFS, Hive, Airflow.

PizzaOrders_DataPipeline There is a Tony who is owning a New Pizza shop. He knew that pizza alone was not going to help him get seed funding to expand

4 Jun 05, 2022

This module is used to create Convolutional AutoEncoders for Variational Data Assimilation

VarDACAE This module is used to create Convolutional AutoEncoders for Variational Data Assimilation. A user can define, create and train an AE for Dat

23 Dec 16, 2022

Mining the Stack Overflow Developer Survey

Mining the Stack Overflow Developer Survey A prototype data mining application to compare the accuracy of decision tree and random forest regression m

1 Nov 16, 2021

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather

791 Jan 04, 2023

peptides.py is a pure-Python package to compute common descriptors for protein sequences

peptides.py Physicochemical properties and indices for amino-acid sequences. 🗺️ Overview peptides.py is a pure-Python package to compute common descr

32 Dec 31, 2022

nrgpy is the Python package for processing NRG Data Files

nrgpy nrgpy is the Python package for processing NRG Data Files Website and source: https://github.com/nrgpy/nrgpy Documentation: https://nrgpy.github

23 Dec 08, 2022

Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).

PandasVault ⁠— Advanced Pandas Functions and Code Snippets The only Pandas utility package you would ever need. It has no exotic external dependencies

374 Jan 07, 2023

Working Time Statistics of working hours and working conditions by industry and company

88 Nov 04, 2022

An experimental project I'm undertaking for the sole purpose of increasing my Python knowledge

5ePy is an experimental project I'm undertaking for the sole purpose of increasing my Python knowledge. #Goals Goal: Create a working, albeit lightwei

1 Nov 24, 2021

Py-price-monitoring - A Python price monitor

A Python price monitor This project was focused on Brazil, so the monitoring is

1 Jan 04, 2022

MotorcycleParts DataAnalysis python

We work with the accounting department of a company that sells motorcycle parts. The company operates three warehouses in a large metropolitan area.

1 Jan 12, 2022

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is a state-of-the-art platform for statistical modeling and high-

229 Dec 29, 2022

ICLR 2022 Paper submission trend analysis

Visualize ICLR 2022 OpenReview Data

75 Dec 06, 2022

Making the DAEN information accessible.

The purpose of this repository is to make the information on Australian COVID-19 adverse events accessible. The Therapeutics Goods Administration (TGA) keeps a database of adverse reactions to medica

10 May 10, 2022

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics. It provides: Frequency table constr

1 Oct 31, 2021

Very useful and necessary functions that simplify working with data

Additional-function-for-pandas Very useful and necessary functions that simplify working with data random_fill_nan(module_name, nan) - Replaces all sp

2 Dec 02, 2021

This is a python script to navigate and extract the FSD50K dataset

FSD50K navigator This is a script I use to navigate the sound dataset from FSK50K.

2 Nov 23, 2021