Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
This is a tool for making a every day video if you take a picture of you everyday

Face-Everyday-Maker-Studio Description This project is a tool for making a everyday video, which is timelapse video or slides video, of images but for

John A Betancourt G 9 Sep 06, 2022
Program for converting video to GIF

video-to-gif Program for converting video to GIF Install the lib: pip install moviepy Usage: Specify the path to the video file. clip = VideoFileClip(

Artem Mokin 0 Dec 13, 2021
Media player custom component which works with MQTT.

Media player custom component which works with MQTT. I designed this to specifically work with a ESP32 which i used to control a speakercraft amp.

2 Feb 10, 2022
Video stream recording dockerized server using python/ffmpeg.

Stream Recording Server Video stream recording dockerized server using python/ffmpeg. Usage Configuration Prepare .env file, check .env.example for th

GR 2 Jan 14, 2022
Filtering user-generated video content(SberZvukTechDays)Filtering user-generated video content(SberZvukTechDays)

Filtering user-generated video content(SberZvukTechDays) Table of contents General info Team members Technologies Setup Result General info This is a

Roman 6 Apr 06, 2022
Search a video semantically with AI.

Which Frame? Search a video semantically with AI. For example, try a natural language search query like "a person with sunglasses". You can also searc

David Chuan-En Lin 1 Nov 06, 2021
Telegram Music/ Video Streaming Bot Using Pytgcalls

Video Player 🔥 ᴢᴀɪᴅ ᴠᴄ ᴘʟᴀyᴇʀ ɪꜱ ᴀ ᴛᴇʟᴇɢʀᴀᴍ ᴘʀᴏᴊᴇᴄᴛ ʙᴀꜱᴇᴅ ᴏɴ ᴘʏʀᴏɢʀᴀᴍ ꜰᴏʀ ᴘʟᴀʏ ᴍᴜꜱɪᴄꜱ ɪɴ ᴠᴄ ᴄʜᴀᴛꜱ... 🅡🅔🅟🅞 🅢🅣🅐🅣🅢 ʀᴇQᴜɪʀᴇᴍᴇɴᴛꜱ 📝 FFmpeg NodeJ

16 Nov 30, 2022
Script simples para baixar vídeos/áudios/playlist do YouTube

🔗 VilelaTube ▶️ Script simples para baixar vídeos/áudios/playlist do YouTube Requisitos • Como usar • Melhorias futuras ⚠️ Atenção! ⚠️ Lembre-se de a

João Victor Vilela dos Santos 2 Nov 03, 2021
LL-HLS implementation written in Python3

biim mpegts stream to Apple Low Latency HLS Feature mpegts demuxing in pure python3 (using asyncio) mpegts stream to fragmented ts use piping from ffm

もにょ~ん 15 Jan 03, 2023
GStreamer Inspector GUI

gst-explorer GStreamer GUI Interface Tool GUI interface for inspecting GStreamer Plugins, Elements and Type Finders. Expects Python3 Qt, PyQt5 and GSt

Jetsonhacks 31 Nov 29, 2022
DICexport is a GUI (PyQt5) to export digital image correlation videos

DIC Video Exporter DICexport is a GUI (PyQt5) to export digital image correlation videos. It offers the flexibility to choose a selected range of a vi

Chaoyi Zhu 0 Jun 23, 2022
Rune - a video miniplayer made with Python.

Rune - a video miniplayer made with Python.

1 Dec 13, 2021
Uncompress DEFLATE streams in pure Python

stream-deflate Uncompress DEFLATE streams in pure Python. Work in progress. This README serves as a rough design spec. Installation pip install stream

Michal Charemza 7 Oct 13, 2022
Rembg Video Virtual Green Screen Edition

Rembg Virtual Greenscreen Edition is a tool to create a green screen matte for videos

Tim Scarfe 217 Jan 06, 2023
A GUI application for cropping images from videos

v-trimming-gui A GUI application for cropping images from videos. 動画をシークバーで操作しながらスクリーンショットを撮るためのアプリ。 Requirement Python =3.7 opencv-python ^4.5.5 PyS

Menrui 6 Feb 05, 2022
Ffmpeg videostream - High speed video frame access in Python, using FFmpeg and FFshow

FFmpeg VideoStream High speed video frame access in Python, using FFmpeg and FFshow This script requires: Karl Kroening's 'ffmpeg-python' library. (ht

3 Sep 29, 2022
Autocut the Twitch VODs based on Marker

Markut Given the VOD of the stream and the markers that are exported as a CSV file, generate a video using ffmpeg that cuts out part of the VOD accord

Tsoding 18 Dec 19, 2022
A Python media index

pyvideo https://pyvideo.org is simply an index of Python-related media records. The raw data being used here comes out of the pyvideo/data repo. Befor

pyvideo 235 Dec 24, 2022
Python package for Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.

The Python package for near duplicate video detection ⭐️ Introduction Videohash is a Python package for detecting near-duplicate videos (Perceptual Vi

Akash Mahanty 144 Dec 19, 2022
Streamlink is a CLI utility which pipes video streams from various services into a video player

Streamlink is a CLI utility which pipes video streams from various services into a video player

8.2k Dec 26, 2022