Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
Autocut the Twitch VODs based on Marker

Markut Given the VOD of the stream and the markers that are exported as a CSV file, generate a video using ffmpeg that cuts out part of the VOD accord

Tsoding 18 Dec 19, 2022
Lightweight, zero-dependency proxy and storage RTSP server

python-rtsp-server Python-rtsp-server is a lightweight, zero-dependency proxy and storage server for several IP-cameras and multiple clients. Features

Vlad 39 Nov 23, 2022
VIT - VideoInTerminal. A quick piece of code to play videos in your terminal using python

VIT VIT - VideoInTerminal. A quick piece of code to play videos in your terminal using python.

ShellTear 3 Mar 03, 2022
A python generator that converts youtube videos to ascii art in your console.

Video To ASCII A python generator that converts youtube videos to ascii art in your console. This has not been tested for windows! Example Normal mode

Julian Jones 24 Nov 02, 2022
A simple Telegram bot to extract hard-coded subtitle from videos using FFmpeg & Tesseract.

Video Subtitle Extractor Bot A simple Telegram bot to extract hard-coded subtitle from videos using FFmpeg & Tesseract. Note that the accuracy of reco

14 Oct 28, 2022
Tiny python video cutter

tiny_python_video_cutter Source code based on a discussion in StackOverflow Setup project in Pycharm: Configure virtual env in Pycharm. You are done w

Truong 2 May 28, 2022
Video Editor for Linux

Project on break until late March. NEW RELEASE 2.8 IS OUT NOW. INSTALLING: see here. RELEASE NOTES AVAILABLE here. Introduction Features Releases Inst

1.9k Jan 07, 2023
A tool to fuck a video/audio quality using FFmpeg

Media quality fucker A tool to fuck a video/audio quality using FFmpeg How to use Download the source Download Python Extract FFmpeg Put what you want

Maizena 8 Jan 25, 2022
A python program which converts images and video into excel spreadsheets.

image2excel A program which converts images and video into Excel spreadsheets. Usage examples can be found in examples Videos can take a long time to

Oscar Peace 2 Aug 09, 2021
Search a video semantically with AI.

Which Frame? Search a video semantically with AI. For example, try a natural language search query like "a person with sunglasses". You can also searc

David Chuan-En Lin 1 Nov 06, 2021
A web RTSP play platform based on websocket and tornado, websocket use blob binaryType read as ArrayBuffer

A web RTSP play platform based on websocket and tornado, websocket use blob binaryType read as ArrayBuffer

2 Feb 25, 2022
Cvplayer - A simple video player written in python using ffpyplayer and OpenCV

Video Player cvplayer is a minimal wrapper around the ffpyplayer.MediaPlayer cla

ADI 7 Dec 19, 2022
Adblocker for movie subtitles.

SubAdBlock Adblocker for movie subtitles. Usage Place "main.py" and "config.conf" in directory with subtitles and run "main.py". It will automatically

Marko Živić 1 Jan 09, 2022
Program to play videos with props in Apex Legends

R5Fresh A video player for the Apex Legends mod R5Reloaded

9 Nov 13, 2022
A youtube video link or id to video thumbnail python package.

Youtube-Video-Thumbnail A youtube video link or id to video thumbnail python package. Made with Python3

Fayas Noushad 10 Oct 21, 2022
Add filters (background blur, etc) to your webcam on Linux.

Add filters (background blur, etc) to your webcam on Linux.

Jashandeep Sohi 480 Dec 14, 2022
Telegram Video Chat Video Streaming bot 🇱🇰

🧪 Get SESSION_NAME from below: Pyrogram 🎭 Preview ✨ Features Music & Video stream support MultiChat support Playlist & Queue support Skip, Pause, Re

DOOZY YEZ 5 Jun 26, 2022
Python based script to operate FFMPEG.

FMPConvert Python based script to operate FFMPEG. Ver 1.0 -- 2022.02.08 Feature ✅ Maximum compatibility: Third-party dependency libraries unused ✅ Che

cybern000b 1 Feb 28, 2022
A Simple Telegram Bot By @Tellybots to add Subtitle Files in Video

Video-subtitle-merger A Simple Telegram Bot By @Tellybots to add Subtitle Files in Video Features Force Sub Button Added Soon Support Media Type Such

6 Dec 31, 2021
A GUI based datamoshing apllication for everyone! Apply this glitch to your videos and gifs. Supports all video formats!

A GUI based datamoshing apllication for everyone! Apply this glitch to your videos and gifs. Supports all video formats!

Akascape 131 Dec 31, 2022