Integrate bus data from a variety of sources (batch processing and real time processing).

Last update: Nov 25, 2021

Related tags

Data Analysis bus_data_ingestion_pipeline

Overview

Purpose: This is integrate bus data from a variety of sources such as: csv, json api, sensor data ... into Relational Database (batch processing and real time processing)

Technique:

Python
Application: Kafka, MQTT Explorer, Grafana, Influxdb, MS VS Studio 2019, MS SQL Server, PowerBI Desktop
Framework: kafka-python, numpy, paho-mqtt, pandas, pyodbc, pyspark
Database: sql -- install MS SQL Server
Evironment: window 10 64bit
Editor: cmd

Workflow:

Import raw data offline from csv, txt file source into DataLake (stored in MS SQL Server) with python. Then ETL (Extract Transform Load) data from DataLake into Data Warehouse with SSIS (SQL Server Integration Services).
Setup schedule for pipeline ETL.
Modeling and Visualization from DWH.
Crawl the online General Transport Feed Spec (GTFS) file into JSON file. Convert from Protobuf to JSON file or CSV then save it to my database with python and kafka streaming. Source: https://developer.nationaltransport.ie/
Streaming and draw the data into the dashboard to show the performance by sensor data with paho-mqtt (or kafka-python) and BI tool Grafana.

Output:

Data pipeline from data sources into target data.
Data stored in Data warehouse for analysis.
Raw data from Crawl the online General Transport Feed Spec.
Real-time dashboard with streaming processing.

Next Step:

Analysis data in DWH
Build Real-time dashboard for raw data from Crawl the online General Transport Feed Spec.

Owner

GitHub Repository

Efficient matrix representations for working with tabular data

Efficient matrix representations for working with tabular data

70 Dec 14, 2022

WAL enables programmable waveform analysis.

This repro introcudes the Waveform Analysis Language (WAL). The initial paper on WAL will appear at ASPDAC'22 and can be downloaded here: https://www.

40 Dec 13, 2022

Desafio 1 ~ Bantotal

Challenge 01 | Bantotal Please read the instructions for the challenge by selecting your preferred language below: Español Português License Copyright

44 Sep 28, 2022

Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video.

Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video. You can chose the cha

2 Jul 22, 2022

For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project Downloads 2. Download all documents,

4 Dec 28, 2021

Recommendations from Cramer: On the show Mad-Money (CNBC) Jim Cramer picks stocks which he recommends to buy. We will use this data to build a portfolio

Backtesting the "Cramer Effect" & Recommendations from Cramer Recommendations from Cramer: On the show Mad-Money (CNBC) Jim Cramer picks stocks which

12 Aug 30, 2022

Exploring the Top ML and DL GitHub Repositories

This repository contains my work related to my project where I scraped data on the most popular machine learning and deep learning GitHub repositories in order to further visualize and analyze it.

17 Aug 21, 2022

A set of procedures that can realize covid19 virus detection based on blood.

A set of procedures that can realize covid19 virus detection based on blood.

3 Mar 07, 2022

A Numba-based two-point correlation function calculator using a grid decomposition

A Numba-based two-point correlation function (2PCF) calculator using a grid decomposition. Like Corrfunc, but written in Numba, with simplicity and hackability in mind.

3 Aug 24, 2022

SNV calling pipeline developed explicitly to process individual or trio vcf files obtained from Illumina based pipeline (grch37/grch38).

SNV Pipeline SNV calling pipeline developed explicitly to process individual or trio vcf files obtained from Illumina based pipeline (grch37/grch38).

1 Nov 02, 2021

Single machine, multiple cards training; mix-precision training; DALI data loader.

Template Script Category Description Category script comparison script train.py, loader.py for single-machine-multiple-cards training train_DP.py, tra

2 Jun 27, 2022

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.

Please consider citing the manuscript if you use apricot in your academic work! You can find more thorough documentation here. apricot implements subm

457 Dec 20, 2022

A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

The leading use-case for the staircase package is for the creation and analysis of step functions. Pretty exciting huh. But don't hit the close button

48 Dec 21, 2022

Monitor the stability of a pandas or spark dataframe ⚙︎

Population Shift Monitoring popmon is a package that allows one to check the stability of a dataset. popmon works with both pandas and spark datasets.

403 Dec 07, 2022

A forecasting system dedicated to smart city data

smart-city-predictions System prognostyczny dedykowany dla danych inteligentnych miast Praca inżynierska realizowana przez Michała Stawikowskiego and

1 Nov 08, 2021

Sample code for Harry's Airflow online trainng course

Sample code for Harry's Airflow online trainng course You can find the videos on youtube or bilibili. I am working on adding below things: the slide p

102 Dec 30, 2022

BErt-like Neurophysiological Data Representation

BENDR BErt-like Neurophysiological Data Representation This repository contains the source code for reproducing, or extending the BERT-like self-super

114 Dec 23, 2022

songplays datamart provide details about the musical taste of our customers and can help us to improve our recomendation system

Songplays User activity datamart The following document describes the model used to build the songplays datamart table and the respective ETL process.

1 Jul 13, 2021

Project under the certification "Data Analysis with Python" on FreeCodeCamp

Sea Level Predictor Assignment You will anaylize a dataset of the global average sea level change since 1880. You will use the data to predict the sea

3 Jan 31, 2022

Python library for creating data pipelines with chain functional programming

PyFunctional Features PyFunctional makes creating data pipelines easy by using chained functional operators. Here are a few examples of what it can do

2.1k Jan 05, 2023