Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

Simple Telegram AI Chat bot made using OpenAI and Luna API

Yui Yui, is a simple telegram chat bot made using OpenAI and Luna Chat bot Deployment 👀 Deploying is easy 🤫 ! You can deploy this bot in Heroku or i

I'm Not A Bot #Left_TG 21 Dec 29, 2022
An advanced QR Code telegram bot with more features.

QR Code Bot A telegram qr code encode and decode bot Advanced Features 1. Database ( MongoDB ) Support 2. Broadcast Support 3. Status Command 4. Setti

Fayas Noushad 16 Nov 12, 2022
🎥 Stream your favorite movie from the terminal!

Stream-Cli stream-cli is a Python scrapping CLI that combine scrapy and webtorrent in one command for streaming movies from your terminal. Installatio

R E D O N E 379 Dec 24, 2022
Binance leverage futures Hook

Simple binance futures Attention Just use leverage. The fee difference between futures and spot is not considered. For example, funding rate, etc. Onl

Adriance 26 Aug 27, 2022
This is a starter template of discord.py project

Template Discord.py This is a starter template of discord.py project (Supports Slash commands!). 👀 Getting Started First, you need to install Python

1 Dec 22, 2021
A file-based quote bot written in Python

Let's Write a Python Quote Bot! This repository will get you started with building a quote bot in Python. It's meant to be used along with the Learnin

1 Jan 19, 2022
PlaylistAudioBot - Telegram playlist download bot with ytdl

Telegram PlaylistAudioBot PlaylistAudioBot: 🇬🇧 Telegram playlist download bot

Hüzünlü Artemis [HuzunluArtemis] 14 Jul 22, 2022
ByDiego Token Grabber is a Discord Stealer

ByDiego Token Grabber is a Discord Stealer. This way you can get too much information from x person if you pass it on and open it

zByDiegoM.T 4 Mar 11, 2022
Reads and prints information from the website MalAPI.io

MalAPIReader Reads and prints information from the website MalAPI.io optional arguments:

Squiblydoo 16 Nov 10, 2022
Shuffle and add items from jellyfin to mpd (use in tandem with jellyfin-mopidy and mpd-mopidy). Similar to ncmpcpp's "Add random" feature..

jellyshuf Essentially implements ncmpcpp's add random feature (default hotkey: `) through a script which grabs info from jellyfin api itself. jellyfin

Ethan Djeric 2 Dec 14, 2021
A pdisk uploader bot written in Python

Pdisk Uploader Bot 🔥 Upload on Pdisk by Url, File and also by direct forward post from other channel... Features Post to Post Conversion Url Upload D

Paritosh Kumar 33 Oct 21, 2022
修改自SharpNoPSExec的基于python的横移工具 A Lateral Movement Tool Learned From SharpNoPSExec -- Twitter: @juliourena

PyNoPSExec A Lateral Movement Tool Learned From SharpNoPSExec -- Twitter: @juliourena 根据@juliourena大神的SharpNOPsExec项目改写的横向移动工具 Platform(平台): Windows 1

<a href=[email protected]"> 23 Nov 09, 2022
DeFi wallet on Chia Network.

DeFi wallet on Chia Network.

GobyWallet 21 Aug 12, 2022
A simple telegram bot that takes a list of files sent by the user and returns them 7zipped

A simple telegram bot that takes a list of files sent by the user and returns them 7zipped

1 Oct 28, 2022
A chatbot that helps you set price alerts for your amazon products.

Amazon Price Alert Bot Description A Telegram chatbot that helps you set price alerts for amazon products. The bot checks the price of your watchliste

Rittik Basu 24 Dec 29, 2022
A Discord bot to allow people to create lists of random characters, with limit reroll options.

Mugen Bot A small bot I made to practice python and allow people to publically select random characters on a discord server. Uses py-cord, as that is

Haley 2 Feb 06, 2022
OpenSource bot for control groups ...

⭕️ کمک به افراد برای اداره هرچه فان تره گروه 📟 همه گروه های بزرگ نیاز به یه بات خفن دارن تا از گروه مراقبت کنه این بات کارش همینه سعی کرده فیچر خیلی

Mehran Alam Beigi 2 Nov 26, 2021
Access LeetCode problems via id

LCid - access LeetCode problems via id Introduction As a world's leading online programming learning platform, LeetCode is quite popular among program

bunnyxt 14 Oct 08, 2022
Recommendation systems are among most widely preffered marketing strategies.

Recommendation systems are among most widely preffered marketing strategies. Their popularity comes from close prediction scores obtained from relationships of users and items. In this project, two r

Sübeyte 8 Oct 06, 2021
vk.com API python wrapper

Python vk.com API wrapper This is a vk.com (the largest Russian social network) python API wrapper. The goal is to support all API methods (current an

Dmitry Voronin 371 Dec 29, 2022