Pipelines de datos, 2021.

Last update: May 19, 2022

Related tags

Overview

Este repo ilustra un proceso sencillo de automatización de transformación y modelado de datos, a través de un pipeline utilizando Luigi.

Stack principal

Python 3.7+
Streamlit
Scikit-learn
Pandas
Luigi

Idea

El proceso completo es descrito en una app interactiva que encuentras en el script app.py. Checa los detalles de cómo levantar la app en la sección de cómo ejecutar los scripts.

Setup

Crea un entorno virtual (te recomiendo usar conda):
```
conda create --name data-pipes python=3.7
```
Activate the virtual environment:
```
conda activate data-pipes
```
Install requirements:
```
pip install -r requirements.txt
```

Ejecuta los scripts

App interactiva

Para ejecutar la app interactiva, simplemente ejecuta el comando de Streamlit con el entorno virtual activado:

(data-pipes) streamlit run app.py

Esto abrirá un servidor local en: http://localhost:8501.

Pipeline de datos

Si deseas ejecutar una tarea en específico ,supongamos la TareaX que se encuentra en el script tareas.py, entonces ejecuta el comando:

PYTHONPATH=. luigi --module tareas TareaX --local-scheduler

¡Puedes extender el código y agregar las tareas que tú desees!

Pipelines de datos, 2021.

Related tags

Overview

Stack principal

Idea

Setup

Ejecuta los scripts

App interactiva

Pipeline de datos

Owner

Rodolfo Ferro

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

Mkdocs + material + cool stuff

Convolutional 2D Knowledge Graph Embeddings resources

A curated list of FOSS tools to improve the Hacker News experience

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

NLP library designed for reproducible experimentation management

This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

ElasticBERT: A pre-trained model with multi-exit transformer architecture.

Training and evaluation codes for the BertGen paper (ACL-IJCNLP 2021)

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

DeepAmandine is an artificial intelligence that allows you to talk to it for hours, you won't know the difference.

Uses Google's gTTS module to easily create robo text readin' on command.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding".

Korean stereoypte detector with TUNiB-Electra and K-StereoSet

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

Quick insights from Zoom meeting transcripts using Graph + NLP

Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0

A Paper List for Speech Translation

A Structured Self-attentive Sentence Embedding