A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Last update: Sep 07, 2022

Overview

Realtime Financial Market Data Visualization and Analysis

Introduction

This repo shows my project about real-time stock data pipeline. All the code is written in PYTHON. In this project, I play with various Data Engineering frameworks to develop a financial data processing and visualization platform using Apache Kafka, Apache Cassandra, and Bokeh. I used Kafka for realtime stock price and market news streaming, Cassandra for historical and realtime stock data warehousing, and Bokeh for visualization on web browsers. I also wrote a web crawler to scrape companys' financial statements and basic information from Yahoo Finance, and played with various economy data APIs.

Architecture

There are currently 3 tabs in the webpage:

Stock: Streaming & Fundamental
- Single stock's candlestick plot, basic company & financial information;
- Realtime S&P500 price during trading hours (fake date during non-trading hours)
Stock: Comparison
- 2 user-selected stocks' price, and their statstical summay and correlation
- 5,10,30-day moving average of adjusted close price
Economy
- Geomap of various economy data by state
- 4 economy indicators nationwide for comparison
- The most recent market news

Here is the architecture of the platform.

How Stock Data is Streamed via Kafka to Cassandra:

Please check each tab's screenshot:

Tab 1:

Tab 2:

Tab 3:

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Related tags

Overview

Realtime Financial Market Data Visualization and Analysis

Introduction

Architecture

Owner

TE-dependent analysis (tedana) is a Python library for denoising multi-echo functional magnetic resonance imaging (fMRI) data

Create HTML profiling reports from pandas DataFrame objects

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Employee Turnover Analysis

A collection of learning outcomes data analysis using Python and SQL, from DQLab.

Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner.

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

InDels analysis of CRISPR lines by NGS amplicon sequencing technology for a multicopy gene family.

A set of functions and analysis classes for solvation structure analysis

In this tutorial, raster models of soil depth and soil water holding capacity for the United States will be sampled at random geographic coordinates within the state of Colorado.

Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Working Time Statistics of working hours and working conditions by industry and company

Collections of pydantic models

Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python

Used for data processing in machine learning, and help us to construct ML model more easily from scratch

Tools for analyzing data collected with a custom unity-based VR for insects.

Finding project directories in Python (data science) projects, just like there R rprojroot and here packages

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.