Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

Get your Pixiv token (for running upbit/pixivpy)

gppt: get-pixivpy-token Get your Pixiv token (for running upbit/pixivpy) Refine pixiv_auth.py + its fork Install ❭ pip install gppt Run Note: In advan

haruna 58 Jan 04, 2023
Terraform Cloud CLI for Managing Workspace Terraform Versions

Terraform Cloud Version Manager This tiny script makes it easy to update the Terraform Version on all of the Workspaces inside Terraform Cloud. It wil

Robert Hafner 1 Jan 07, 2022
This bot will automatically like and follow users that post under a specified hashtag

Instagram-bot This bot will automatically like and follow users that post under a specified hashtag Dependencies Java JDK Selenium Updated version of

Makana Edwards 1 Nov 04, 2021
Pack up to 3MB of data into a tweetable PNG polyglot file.

tweetable-polyglot-png Pack up to 3MB of data into a tweetable PNG polyglot file. See it in action here: https://twitter.com/David3141593/status/13719

David Buchanan 2.4k Dec 29, 2022
a simple floating window for watch cryptocurrency price

floating-monitor with cryptocurrency 浮動視窗虛擬貨幣價格監控 a floating monitor window to show price of cryptocurrency. use binance api to get price 半透明的浮動視窗讓你方便

Lin_Yi_Shen 1 Oct 22, 2021
A discord bot consuming Notion API to add, retrieve data to Notion databases.

Notion-DiscordBot A discord bot consuming Notion API to add and retrieve data from Notion databases. Instructions to use the bot: Pre-Requisites: a)In

Servatom 57 Dec 29, 2022
The Discord bot framework for Python

Pycordia ⚠️ Note! As of now, this package is under early development so functionalities are bound to change drastically. We don't recommend you curren

Ángel Carias 24 Jan 01, 2023
AWS Lambda - Parsing Cloudwatch Data and sending the response via email.

AWS Lambda - Parsing Cloudwatch Data and sending the response via email. Author: Evan Erickson Language: Python Backend: AWS / Serverless / AWS Lambda

Evan Scott Erickson 1 Nov 14, 2021
A bot that updates about the most subscribed artist' channels on YouTube

A bot that updates about the most subscribed artist' channels on YouTube. A weekly top chart report is provided every Monday. It posts updates on Twitter

Marco Fantauzzo 5 Dec 14, 2022
Simple Python Auto Follow Bot

Instagram-Auto-Follow-Bot Description Một IG BOT đơn giản. Tự động follow những người mà bạn muốn cướp follow. Tự động unfollow. Tự động đăng nhập vào

CodingLinhTinh 3 Aug 27, 2022
API Wrapper for seedr.cc

Seedr Python Client Seedr API built with 💛 by Souvik Pratiher Hit that Star button if you like this kind of SDKs and wants more of similar SDKs for o

Souvik Pratiher 2 Oct 24, 2021
A Bot To Find Telegram User ID Easily

Telegram ID Bot 🤖 A Bot To Find Telegram User ID Easily Made with Python3 (C) @BXBotz Copyright permission under MIT License License - https://githu

MuFaz-TG 6 Nov 21, 2022
Morpy Bot Linux - Morpy Bot Linux With Python

Morpy_Bot_Linux Guide to using the robot : 🔸 Lsmod = to identify admins and st

2 Jan 20, 2022
A Discord bot to play bluffing games like Dobbins or Bobbins

Usage: pip install -r requirements.txt python3 bot.py DISCORD_BOT_TOKEN Gameplay: All commands are case-insensitive, with trailing punctuation and spa

4 May 27, 2022
Work with the AWS IP address ranges in native Python.

Amazon Web Services (AWS) publishes its current IP address ranges in JSON format. Python v3 provides an ipaddress module in the standard library that allows you to create, manipulate, and perform ope

AWS Samples 9 Aug 25, 2022
Upvotes and karma for Discord: Heart 💗 or Crush 💔 a comment to give points to an user, or Star ⭐ it to add it to the Best Of!

🤖 Reto Reto is a community-oriented Discord bot, featuring a karma system, a way to reward the best comments, leaderboards, and so much more! React t

Erik Bianco Vera 3 May 07, 2022
A suite of utilities for AWS Lambda Functions that makes tracing with AWS X-Ray, structured logging and creating custom metrics asynchronously easier

A suite of utilities for AWS Lambda Functions that makes tracing with AWS X-Ray, structured logging and creating custom metrics asynchronously easier

Amazon Web Services - Labs 1.9k Jan 07, 2023
Twitter-redesign - Twitter Redesign With Django

Twitter Redesign A project that tests Django and React knowledge through a twitt

Mark Jumba 1 Jun 01, 2022
A bot written in Python to automate attending classes on MyClass (Codetantra).

codetantrabot This is python program to attend class on myclass(codetantra) Prerequisites You should have Python3 and Pip installed on your system Run

Aniket Kumar 1 Feb 08, 2022
This bot is made with Python and it is running using Docker container and is concentrated on heroku.

This bot is made with Python and it is running using Docker container and is concentrated on heroku.

Movindu Bandara 1 Nov 16, 2021