API>local_db>AWS_RDS - Disclaimer! All data used is for educational purposes only.

Overview

API>local_db>AWS_RDS

Disclaimer! All data used is for educational purposes only.

alt text
ETL pipeline diagram.

Aim of project

By creating a fully working pipeline:

  • Familiarise with ETL
  • Improve python (API,pandas), SQL (triggers&procedures) knowledge
  • Work with a cloud storage service

What does it do?

The data used is electricity prices and weather conditions. The pipeline is fully autonomous: scheduled daily (crontab), electricity price data (.xls) is dowloaded, weather data fetched via an API, and inserted into a local database (Postgres). It is then cleaned and transferred (PL/pgSQL) into 3NF-tables (see ERDs below). Lastly, the clean useful data is migrated to Amazon Web Services' RDS remote database via the foreign-fata wrapper in PL/pgSQL.

alt text
Price data ERD.

alt text
Weather data ERD

Further improvements/learnings

  • Switch from time to event-based triggers
  • Upload data in batches, not 'for each row'
  • Prevent SQL injection
Useful tool for inserting DataFrames into the Excel sheet.

PyCellFrame Insert Pandas DataFrames into the Excel sheet with a bunch of conditions Install pip install pycellframe Usage Examples Let's suppose that

Luka Sosiashvili 1 Feb 16, 2022
Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Damien Farrell 81 Dec 26, 2022
Automated Exploration Data Analysis on a financial dataset

Automated EDA on financial dataset Just a simple way to get automated Exploration Data Analysis from financial dataset (OHLCV) using Streamlit and ta.

Darío López Padial 28 Nov 27, 2022
4CAT: Capture and Analysis Toolkit

4CAT: Capture and Analysis Toolkit 4CAT is a research tool that can be used to analyse and process data from online social platforms. Its goal is to m

Digital Methods Initiative 147 Dec 20, 2022
Projeto para realizar o RPA Challenge . Utilizando Python e as bibliotecas Selenium e Pandas.

RPA Challenge in Python Projeto para realizar o RPA Challenge (www.rpachallenge.com), utilizando Python. O objetivo deste desafio é criar um fluxo de

Henrique A. Lourenço 1 Apr 12, 2022
Tools for working with MARC data in Catalogue Bridge.

catbridge_tools Tools for working with MARC data in Catalogue Bridge. Borrows heavily from PyMarc

1 Nov 11, 2021
This module is used to create Convolutional AutoEncoders for Variational Data Assimilation

VarDACAE This module is used to create Convolutional AutoEncoders for Variational Data Assimilation. A user can define, create and train an AE for Dat

Julian Mack 23 Dec 16, 2022
small package with utility functions for analyzing (fly) calcium imaging data

fly2p Tools for analyzing two-photon (2p) imaging data collected with Vidrio Scanimage software and micromanger. Loading scanimage data relies on scan

Hannah Haberkern 3 Dec 14, 2022
pyETT: Python library for Eleven VR Table Tennis data

pyETT: Python library for Eleven VR Table Tennis data Documentation Documentation for pyETT is located at https://pyett.readthedocs.io/. Installation

Tharsis Souza 5 Nov 19, 2022
Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.

Stock Statistics/Indicators Calculation Helper VERSION: 0.3.2 Introduction Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline s

Cedric Zhuang 1.1k Dec 28, 2022
Data cleaning tools for Business analysis

Datacleaning datacleaning tools for Business analysis This program is made for Vicky's work. You can use it, too. 数据清洗 该数据清洗工具是为了商业分析 这个程序是为了Vicky的工作而

Lin Jian 3 Nov 16, 2021
Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python.

Fast Laplacian Eigenmaps in python Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python. Comes with an wrapper for NMS

17 Jul 09, 2022
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen 3.7k Jan 03, 2023
Desafio 1 ~ Bantotal

Challenge 01 | Bantotal Please read the instructions for the challenge by selecting your preferred language below: Español Português License Copyright

Maratona Behind the Code 44 Sep 28, 2022
This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot.

superSFS This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot. It is easy-to-use and runing fast. What you s

3 Dec 16, 2022
A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful.

How useful is the aswer? A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful. If you want to l

1 Dec 17, 2021
Churn prediction with PySpark

It is expected to develop a machine learning model that can predict customers who will leave the company.

3 Aug 13, 2021
t-SNE and hierarchical clustering are popular methods of exploratory data analysis, particularly in biology.

tree-SNE t-SNE and hierarchical clustering are popular methods of exploratory data analysis, particularly in biology. Building on recent advances in s

Isaac Robinson 61 Nov 21, 2022
Stock Analysis dashboard Using Streamlit and Python

StDashApp Stock Analysis Dashboard Using Streamlit and Python If you found the content useful and want to support my work, you can buy me a coffee! Th

StreamAlpha 27 Dec 09, 2022
A 2-dimensional physics engine written in Cairo

A 2-dimensional physics engine written in Cairo

Topology 38 Nov 16, 2022