whylogs Workshop

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

whylogs - The open source standard for data logging (Don't forget to give it a star!)

Workshop

In this hands-on workshop, we’ll learn how to set up a system for monitoring your data pipelines, ensuring data quality and detecting changes in your data.

Without data monitoring, it’s impossible to guarantee to your stakeholders that the data that they are using for their analytics and machine learning use cases is trustworthy. By setting up a data observability system, you’ll be able to get visibility into the health of your data pipelines, thus building your customers’ trust in your work.

We’ll cover the following:

Introduction to data observability and monitoring
whylogs — the open source standard for data logging
How to monitor batch Python or Spark data pipelines with whylogs
How to monitor Kafka streaming pipelines with whylogs

By the end of this workshop, you’ll be able to set up such a system yourself.

Code

This repository contains files that are needed for the workshop:

ccloud_lib.py - file for connecting to confluent cloud
confluent_credentials.txt - template for configuration (put your credentials there - but don't commit them!)
producer.py - the code for putting events to Kafka
requirements.txt - all the dependencies for the workshop

Confluent cloud

For this workshop, you'll need

Account in Deepnote
Account in Confluent cloud (instructions)

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

Related tags

Overview

whylogs Workshop

Workshop

Code

Confluent cloud

Owner

DataTalksClub

ByT5: Towards a token-free future with pre-trained byte-to-byte models

ASCEND Chinese-English code-switching dataset

Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Share constant definitions between programming languages and make your constants constant again

Code for text augmentation method leveraging large-scale language models

This repository describes our reproducible framework for assessing self-supervised representation learning from speech

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

多语言降噪预训练模型MBart的中文生成任务

CLIPfa: Connecting Farsi Text and Images

This repository contains the code for "Generating Datasets with Pretrained Language Models".

UniSpeech - Large Scale Self-Supervised Learning for Speech

Yomichad - a Japanese pop-up dictionary that can display readings and English definitions of Japanese words

Scikit-learn style model finetuning for NLP

KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark.

Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

Yet another Python binding for fastText

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

Labelling platform for text using distant supervision

[ICLR 2021 Spotlight] Pytorch implementation for "Long-tailed Recognition by Routing Diverse Distribution-Aware Experts."