Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

A tool to build scripts to toggle between minimal & default services in Windows based on user defined lists.

A tool to build scripts to toggle between minimal & default services in Windows based on user defined lists.

AMIT 29 Jan 01, 2023
Throttle and debounce add-on for Pyrogram

pyrothrottle Throttle and debounce add-on for Pyrogram Quickstart implementation on decorators from pyrogram import Client, filters from pyrogram.type

7 Oct 01, 2022
Um simples bot escrito em Python usando a lib pyTelegramBotAPI

Telegram Bot Python Um simples bot escrito em Python usando a lib pyTelegramBotAPI Instalação Windows: Download do Python 3 Aqui Download do ZIP do Có

Sr_Yuu 1 May 07, 2022
Tamil Voicechat UserBot. Powerd By TamilBots. Https://T.me/TamilSupport

Tamil Voicechat UserBot A Telegram UserBot to Play music 🎶 in Voice Chats. It's recommended to use an USA number.(if your real number is suspended I'

Tamil Bots 78 Nov 01, 2022
TFT Bot that automatically surrenders and allows finishing TFT Passes easily.

Image Based TFT Bot TFT Bot that automatically surrenders and allows finishing TFT Passes easily. Please read full file! You can check new releases he

1 Feb 06, 2022
WILSON Cloud Respwnder is a Web Interaction Logger Sending Out Notifications with the ability to serve custom content in order to appropriately respond to client-issued requests.

WILSON Cloud Respwnder What is this? WILSON Cloud Respwnder is a Web Interaction Logger Sending Out Notifications (WILSON) with the ability to serve c

48 Oct 31, 2022
A Python library for the Discourse API

pydiscourse A Python library for working with Discourse. This is a fork of the original Tindie version. It was forked to include fixes, additional fun

Ben Lopatin 72 Oct 14, 2022
Vhook: A Discord webhook spammer / deleter open source coded by vesper

Vhook_Spammer Vhook is a advanced Discord webhook spammer / deleter with embeds,

Vesper 17 Nov 13, 2022
The best discord.py template with a changeable prefix

Discord.py Bot Template By noma4321#0035 With A Custom Prefix To Every Guild Function Features Has a custom prefix that is changeable for every guild

Noma4321 5 Nov 24, 2022
Proxy-Bot - Python proxy bot for telegram

Proxy-Bot 🤖 Proxy bot between the main chat and a newcomer, allows all particip

Anton Shumakov 3 Apr 01, 2022
The aim is to contain multiple models for materials discovery under a common interface

Aviary The aviary contains: - roost, - wren, cgcnn. The aim is to contain multiple models for materials discovery under a common interface Environment

Rhys Goodall 20 Jan 06, 2023
A discord bot that can detect Nitro Scam Links and delete them to protect other users

A discord bot that can detect Nitro Scam Links and delete them to protect other users. Add it to your server from here.

Kanak Mittal 9 Oct 20, 2022
OGE-2022-na-Python - Solving problems in python for the OGE 2022

OGE-2022-na-Python Решение задачек на питоне для ОГЭ 2022 Тут разобраны разные в

Slava 0 Oct 14, 2022
Brute force instagram account / actonetor, 2021

Brute force instagram account / actonetor, 2021

actonetor 6 Nov 16, 2022
An advanced Filter Bot with nearly unlimitted filters!

Unlimited Filter Bot ㅤㅤㅤㅤㅤㅤㅤ ㅤㅤㅤㅤㅤㅤㅤ An advanced Filter Bot with nearly unlimitted filters! Features Nearly unlimited filters Supports all type of fil

TroJanzHEX 445 Jan 03, 2023
Automatically kick deleted accounts

AntiDeletedAccountsBot (ADAB) Automatically kick deleted accounts Based on uniborg, a pluggable asyncio Telegram userbot based on Telethon. Installati

Qwerty-Space 34 Jan 02, 2023
批量下载抖音无水印视频

版权说明 本项目fork自Johnserf-Seed TikTokDownload。目的是为了增加个性化的功能,若想体验更多完善的功能请支持原作者的项目。 免责声明 本代码仅用于学习,下载后请勿用于商业用途。 环境要求 请检查宿主机,是否安装了python环境,并且配置了环境变量 pytho

Zhiwu Mao 44 Dec 28, 2022
Tools ini hanya bisa digunakan untuk menyerang website atau http/s

☢️ Tawkun DoS ☢️ Tools ini hanya bisa digunakan untuk menyerang website atau http/s FITUR: [ ☯️ ] Proxy Mode [ 🔥 ] SOCKS Mode | Kadang Eror [ ☢️ ] Ht

Bandhitawkunthi 9 Jul 19, 2022
A Discord token stealer app written in Python 3.

Discord Token Stealer A Discord token stealer app written in Python 3. This version of the grabber only supports Windows. Features No local caching Tr

cankat 45 Jan 03, 2023