Free Data Engineering course!

Overview

Data Engineering Zoomcamp

Syllabus

Taking the course

Self-paced mode

All the materials of the course are freely available, so you can take the course at your own pace

  • Follow the suggested syllabus (see below) week by week
  • You don't need to fill in the registration form. Just start watching the videos and join Slack
  • Check FAQ if you have problems
  • If you can't find a solution to your problem in FAQ, ask for help in Slack

2022 Cohort

Asking for help in Slack

The best way to get support is to use DataTalks.Club's Slack. Join the #course-data-engineering channel.

To make discussions in Slack more organized:

Syllabus

Week 1: Introduction & Prerequisites

  • Course overview
  • Introduction to GCP
  • Docker and docker-compose
  • Running Postgres locally with Docker
  • Setting up infrastructure on GCP with Terraform
  • Preparing the environment for the course
  • Homework

More details

Week 2: Data ingestion

  • Data Lake
  • Workflow orchestration
  • Setting up Airflow locally
  • Ingesting data to GCP with Airflow
  • Ingesting data to local Postgres with Airflow
  • Moving data from AWS to GCP (Transfer service)
  • Homework

More details

Week 3: Data Warehouse

  • Data Warehouse
  • BigQuery
  • Partitoning and clustering
  • BigQuery best practices
  • Internals of BigQuery
  • Integrating BigQuery with Airflow
  • BigQuery Machine Learning

More details

Week 4: Analytics engineering

  • Basics of analytics engineering
  • dbt (data build tool)
  • BigQuery and dbt
  • Postgres and dbt
  • dbt models
  • Testing and documenting
  • Deployment to the cloud and locally
  • Visualising the data with google data studio and metabase

More details

Week 5: Batch processing

  • Batch processing
  • What is Spark
  • Spark Dataframes
  • Spark SQL
  • Internals: GroupBy and joins

More details

Week 6: Streaming

  • Introduction to Kafka
  • Schemas (avro)
  • Kafka Streams
  • Kafka Connect and KSQL

More details

Week 7, 8 & 9: Project

Putting everything we learned to practice

  • Week 7 and 8: working on your own project
  • Week 9: reviewing your peers

More details

Overview

Architecture diagram

Technologies

  • Google Cloud Platform (GCP): Cloud-based auto-scaling platform by Google
    • Google Cloud Storage (GCS): Data Lake
    • BigQuery: Data Warehouse
  • Terraform: Infrastructure-as-Code (IaC)
  • Docker: Containerization
  • SQL: Data Analysis & Exploration
  • Airflow: Pipeline Orchestration
  • dbt: Data Transformation
  • Spark: Distributed Processing
  • Kafka: Streaming

Prerequisites

To get most out of this course, you should feel comfortable with coding and command line, and know the basics of SQL. Prior experience with Python will be helpful, but you can pick Python relatively fast if you have experience with other programming languages.

Prior experience with data engineering is not required.

Instructors

Tools

For this course you'll need to have the following software installed on your computer:

  • Docker and Docker-Compose
  • Python 3 (e.g. via Anaconda)
  • Google Cloud SDK
  • Terraform

See Week 1 for more details about installing these tools

FAQ

  • Q: I registered, but haven't received a confirmation email. Is it normal? A: Yes, it's normal. It's not automated. But you will receive an email eventually
  • Q: At what time of the day will it happen? A: Office hours will happen on Mondays at 17:00 CET. But everything will be recorded, so you can watch it whenever it's convenient for you
  • Q: Will there be a certificate? A: Yes, if you complete the project
  • Q: I'm 100% not sure I'll be able to attend. Can I still sign up? A: Yes, please do! You'll receive all the updates and then you can watch the course at your own pace.
  • Q: Do you plan to run a ML engineering course as well? A: Glad you asked. We do :)
  • Q: I'm stuck! I've got a technical question! A: Ask on Slack! And check out the student FAQ; many common issues have been answered already. If your issue is solved, please add how you solved it to the document. Thanks!

Our friends

Big thanks to other communities for helping us spread the word about the course:

Check them out - they are cool!

Owner
DataTalksClub
The place to talk about data
DataTalksClub
Module-based cryptographic tool

Cryptosploit A decryption/decoding/cracking tool using various modules. To use it, you need to have basic knowledge of cryptography. Table of Contents

/SNESE_AR\ 33 Nov 27, 2022
Vita Specific Patches and Application for Doki Doki Literature Club (Steam Version) using Ren'Py PSVita

Doki-Doki-Literature-Club-Vita Vita Specific Patches and Application for Doki Doki Literature Club (Steam Version) using Ren'Py PSVita Contains: Modif

Jaylon Gowie 25 Dec 30, 2022
A Python tool to check ASS subtitles for common mistakes and errors.

A Python tool to check ASS subtitles for common mistakes and errors.

1 Dec 18, 2021
Expression interpreter written in Python

Calc Interpreter An interpreter modeled after a calculator implemented in Python 3. The program currently only supports basic mathematical expressions

1 Oct 17, 2021
Chemical equation balancer

Chemical equation balancer Balance your chemical equations with ease! Installation $ git clone

Marijan Smetko 4 Nov 26, 2022
Absolute solvation free energy calculations with OpenFF and OpenMM

ABsolute SOLVantion Free Energy Calculations The absolv framework aims to offer a simple API for computing the change in free energy when transferring

7 Dec 07, 2022
this is a basic python project that I made using python

this is a basic python project that I made using python. This project is only for practice because my python skills are still newbie.

Elvira Firmansyah 2 Dec 14, 2022
Usando Multi Player Perceptron e Regressão Logistica para classificação de SPAM

Relatório dos procedimentos executados e resultados obtidos. Objetivos Treinar um modelo para classificação de SPAM usando o dataset train_data. Class

André Mediote 1 Feb 02, 2022
Objetivo: de forma colaborativa pasar de nodos de Dynamo a Python.

ITTI_Ed01_De-nodos-a-python ITTI. EXPERT TRAINING EN AUTOMATIZACIÓN DE PROCESOS BIM: OFFICIAL DE AUTODESK. Edición 1 Enlace al Master Enunciado: Traba

1 Jun 06, 2022
Basic Hspice runner with Python

HSpicePy Bilgisayarınıza PATH değişkenlerine eklediğiniz HSPICE programını python ile çalıştırmanızı sağlayan basit bir araç. A simple tool that allow

1 Nov 16, 2021
Automatically give thanks to Pypi packages you use in your project!

Automatically give thanks to Pypi packages you use in your project!

Ward 25 Dec 20, 2021
A simple interface to help lazy people like me to shutdown/reboot/sleep their computer remotely.

🦥 Lazy Helper ! A simple interface to help lazy people like me to shut down/reboot/sleep/lock/etc. their computer remotely. - USAGE If you're a lazy

MeHDI Rh 117 Nov 30, 2022
Materials for the Introduction in Python , Linux , Git and Github

This repository contains all the materials of the presentation on the introduction of python, linux, git and Github.

AMMI 3 Aug 28, 2022
Batch obfuscator based on the obfuscation method used by the trick bot launcher

Batch obfuscator based on the obfuscation method used by the trick bot launcher

SlizBinksman 2 Mar 19, 2022
reproduces experiments from

Installation To enable importing of modules, from the parent directory execute: pip install -e . To install requirements: python -m pip install requir

Meta Research 15 Aug 11, 2022
Margin Calculator - Personally tailored investment tool

Margin Calculator - Personally tailored investment tool

1 Jul 19, 2022
The presented desktop application was made to solve 1d schrodinger eqation

schrodinger_equation_1d_solver The presented desktop application was made to solve 1d schrodinger eqation. It implements Numerov's algorithm (step by

Artem Kashapov 2 Dec 29, 2021
LAPS module for CrackMapExec

Crackmapexec-LAPS LAPS module for CrackMapExec Make sure to point to the DC Specify the full domain name Be careful the rid 500 might not be "Administ

28 Oct 05, 2022
Python project that aims to discover CDP neighbors and map their Layer-2 topology within a shareable medium like Visio or Draw.io.

Python project that aims to discover CDP neighbors and map their Layer-2 topology within a shareable medium like Visio or Draw.io.

3 Feb 11, 2022
More routines for operating on iterables, beyond itertools

More Itertools Python's itertools library is a gem - you can compose elegant solutions for a variety of problems with the functions it provides. In mo

2.8k Jan 02, 2023