PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.

Last update: Aug 01, 2022

Related tags

Overview

PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.

Objectives

The main objective of this library is to take training data from Kafka to create a PyTorch Dataset. This is useful when we have data distributed in Kafka and we want to train a model with this framework. The structure of data messages in Kafka should be key:value, where key is the label and value the input.

Usage

To use this library, you just have to create a TrainingKafkaDataset with a ControlMessage, boostrapServers, and a group_id. Once the object has been created and the data has been obtained from Kafka, the object is usable as a normal PyTorch Dataset, being for example, iterable with a DataLoader.

ControlMessage is a dictionary, which principal keys are topic and input_config.

In topic, you have to proportionate a comma-separated string with the different topic, partition, start and end offset (those values separated with double dots, like in Kafka). In input_config, you have to indicate the reshapes of the data fetched from Kafka, this is because Kafka works in bytes, and its needed to decode back the inputs of our model.

boostrap_servers and group_id are common parameters used in KafkaConsumers. This parameters are given directly to the KafkaConsumers inside the object.

Here you have an example of creating a TrainingKafkaDataset:

kafkaControlMessage = {'topic': 'pytorch_mnist_test:0:0:20000,pytorch:0:20000:50000,pytorch_mnist_test:0:120000:140000',
                'input_config': {'data_type': 'uint8', 
                                 'label_type': 'uint8', 
                                 'data_reshape': '28 28', 
                                 'label_reshape': ''}, 
                }
bootstrap_server = ["localhost:9094"]
group_id = 'sink'
df = TrainingKafkaDataset(kafkaControlMessage, bootstrap_server, group_id, ToTensor())

Examples

There is a folder with full example of Data Fetching and training of a model, specifically with MNIST dataset.

PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.

Related tags

Overview

PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.

Objectives

Usage

Examples

Owner

ERTIS Research Group

A scanpy extension to analyse single-cell TCR and BCR data.

RLDS stands for Reinforcement Learning Datasets

Official implementation of SIGIR'2021 paper: "Sequential Recommendation with Graph Neural Networks".

This is a deep learning-based method to segment deep brain structures and a brain mask from T1 weighted MRI.

Code for the IJCAI 2021 paper "Structure Guided Lane Detection"

Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

git《Commonsense Knowledge Base Completion with Structural and Semantic Context》(AAAI 2020) GitHub: [fig1]

Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification

Code for the paper "Curriculum Dropout", ICCV 2017

Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

A Python implementation of active inference for Markov Decision Processes

HandFoldingNet ✌️ : A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton

A custom-designed Spider Robot trained to walk using Deep RL in a PyBullet Simulation

BackgroundRemover lets you Remove Background from images and video with a simple command line interface

Code for the paper "Implicit Representations of Meaning in Neural Language Models"

Human segmentation models, training/inference code, and trained weights, implemented in PyTorch

ChatBot-Pytorch - A GPT-2 ChatBot implemented using Pytorch and Huggingface-transformers

Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)