DLFlow is a deep learning framework.

Overview

contributions license python

DLFlow - A Deep Learning WorkFlow

DLFlow概述

DLFlow是一套深度学习pipeline,它结合了Spark的大规模特征处理能力和Tensorflow模型构建能力。利用DLFlow可以快速处理原始特征、训练模型并进行大规模分布式预测,十分适合离线环境下的生产任务。利用DLFlow,用户只需专注于模型开发,而无需关心原始特征处理、pipeline构建、生产部署等工作。

功能支持

配置驱动: DLFlow通过配置驱动,修改配置可以快速更换特征、模型超参数、任务流程等等,极大提高工作效率。

模块化结构: 任务和模型以插件形式存在,便于使用和开发,用户可以可以轻地将自定义任务和模型注册到框架内使用。

任务自组织: 通过内置的Workflow框架,根据任务的产出标记自动解决任务依赖,轻松构建深度学习pipeline。

最佳实践: 融入滴滴用户画像团队深度学习离线任务的最佳实践,有效应对离线生产中的多种问题。将Tensorflow和Spark进行合理结合,更适合离线深度学习任务。

快速开始

环境准备

首先请确保环境中已经安装和配置Hadoop和Spark,并设置好了基本的环境变量。

  • Tensorflow访问HDFS

为了能够使用让Tensorflow访问HDFS,需要确保如下环境变量生效:

# 确保libjvm.so被添加到LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${JAVA_HOME}/jre/lib/amd64/server

# 确保hadoop jars被添加到CLASSPATH
export CLASSPATH=${CLASSPATH}:$(hadoop classpath --glob)

关于Tensorflow访问HDFS更多内容请参见 TensorFlow on Hadoop

  • Spark读写TFReocrds
# Clone tensorflow/ecosystem项目
git clone https://github.com/tensorflow/ecosystem.git

cd ecosystem/spark/spark-tensorflow-connector/

# 构建spark-tensorflow-connector
mvn versions:set -DnewVersion=1.14.0
mvn clean install

项目构建后生成 target/spark-tensorflow-connector_2.11-1.14.0.jar,后续需要确保该jar被添加到 spark.jars 中。 关于Spark读写TFRecoreds更多内容请参见 spark-tensorflow-connector

安装

通过pip安装:

pip install dlflow

通过源代码安装:

git clone  https://github.com/didi/dlflow.git
cd dlflow
python setup.py install

使用

  • 配置文件

运行配置可参考 conf 目录中的配置。 关于配置详情请参考 配置说明

  • 以模块运行
python -m dlflow.main --config <CONFIGURATION FILE>.conf
  • 以脚本运行

确保python环境的 bin 目录已经被添加到环境变量 PATH

export PATH=$PATH:/usr/local/python/bin

之后通过如下命令运行

dlflow --config .conf

更详细的使用参见 使用说明

预定义任务

预定义任务 描述
Merge 特征融合任务,请参见 特征融合
Encode 解析原始特征,对特征进行编码和预处理,生成能够直接输入模型的特征
Train 模型训练任务
Evaluate 模型评估任务
Predict 模型预测任务,使用Spark进行分布式预测,具备处理大规模数据能力

手册目录

技术方案

DLFlow整体架构

整体架构

DLFLow pipeline

Pipeline

Contributing

欢迎使用并参与到本项目的建设中,详细内容请参见 Contribution Guide

License

DLFlow 基于Apache-2.0协议进行分发和使用,更多信息参见 LICENSE

Owner
DiDi
滴滴出行
DiDi
Finite-temperature variational Monte Carlo calculation of uniform electron gas using neural canonical transformation.

CoulombGas This code implements the neural canonical transformation approach to the thermodynamic properties of uniform electron gas. Building on JAX,

FermiFlow 9 Mar 03, 2022
Code for Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019) We propose Disentangled Audio-Visual System (DAVS) to ad

Hang_Zhou 750 Dec 23, 2022
MediaPipe Kullanarak İleri Seviye Bilgisayarla Görü

MediaPipe Kullanarak İleri Seviye Bilgisayarla Görü

Burak Bagatarhan 12 Mar 29, 2022
Real-Time Semantic Segmentation in Mobile device

Real-Time Semantic Segmentation in Mobile device This project is an example project of semantic segmentation for mobile real-time app. The architectur

708 Jan 01, 2023
An Open-Source Package for Information Retrieval.

OpenMatch An Open-Source Package for Information Retrieval. 😃 What's New Top Spot on TREC-COVID Challenge (May 2020, Round2) The twin goals of the ch

THUNLP 439 Dec 27, 2022
2021-AIAC-QQ-Browser-Hyperparameter-Optimization-Rank6

2021-AIAC-QQ-Browser-Hyperparameter-Optimization-Rank6

Aigege 8 Mar 31, 2022
Traductor de lengua de señas al español basado en Python con Opencv y MedaiPipe

Traductor de señas Traductor de lengua de señas al español basado en Python con Opencv y MedaiPipe Requerimientos 🔧 Python 3.8 o inferior para evitar

Jahaziel Hernandez Hoyos 3 Nov 12, 2022
FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

HKBU High Performance Machine Learning Lab 6 Nov 18, 2022
Code for training and evaluation of the model from "Language Generation with Recurrent Generative Adversarial Networks without Pre-training"

Language Generation with Recurrent Generative Adversarial Networks without Pre-training Code for training and evaluation of the model from "Language G

Amir Bar 253 Sep 14, 2022
An expansion for RDKit to read all types of files in one line

RDMolReader An expansion for RDKit to read all types of files in one line How to use? Add this single .py file to your project and import MolFromFile(

Ali Khodabandehlou 1 Dec 18, 2021
A fast and easy to use, moddable, Python based Minecraft server!

PyMine PyMine - The fastest, easiest to use, Python-based Minecraft Server! Features Note: This list is not always up to date, and doesn't contain all

PyMine 144 Dec 30, 2022
Code for "Unsupervised Source Separation via Bayesian inference in the latent domain"

LQVAE-separation Code for "Unsupervised Source Separation via Bayesian inference in the latent domain" Paper Samples GT Compressed Separated Drums GT

Michele Mancusi 30 Oct 25, 2022
Trajectory Extraction of road users via Traffic Camera

Traffic Monitoring Citation The associated paper for this project will be published here as soon as possible. When using this software, please cite th

Julian Strosahl 14 Dec 17, 2022
Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

**Codebase and data are uploaded in progress. ** VOLT(-py) is a vocabulary learning codebase that allows researchers and developers to automaticaly ge

416 Jan 09, 2023
用强化学习DQN算法,训练AI模型来玩合成大西瓜游戏,提供Keras版本和PARL(paddle)版本

用强化学习玩合成大西瓜 代码地址:https://github.com/Sharpiless/play-daxigua-using-Reinforcement-Learning 用强化学习DQN算法,训练AI模型来玩合成大西瓜游戏,提供Keras版本、PARL(paddle)版本和pytorch版本

72 Dec 17, 2022
SCU OlympicsRunning Baseline

Competition 1v1 running Environment check details in Jidi Competition RLChina2021智能体竞赛 做出的修改: 奖励重塑:修改了环境,重新设置了奖励的分配,使得奖励组成不只有零和博弈,还有探索环境的奖励。 算法微调:修改了官

ZiSeoi Wong 2 Nov 23, 2021
KaziText is a tool for modelling common human errors.

KaziText KaziText is a tool for modelling common human errors. It estimates probabilities of individual error types (so called aspects) from grammatic

ÚFAL 3 Nov 24, 2022
[RSS 2021] An End-to-End Differentiable Framework for Contact-Aware Robot Design

DiffHand This repository contains the implementation for the paper An End-to-End Differentiable Framework for Contact-Aware Robot Design (RSS 2021). I

Jie Xu 60 Jan 04, 2023
A minimal implementation of face-detection models using flask, gunicorn, nginx, docker, and docker-compose

Face-Detection-flask-gunicorn-nginx-docker This is a simple implementation of dockerized face-detection restful-API implemented with flask, Nginx, and

Pooya-Mohammadi 30 Dec 17, 2022
QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

This project provides abundant choices of quantization strategies (such as the quantization algorithms, training schedules and empirical tricks) for quantizing the deep neural networks into low-bit c

Monash Green AI Lab 51 Dec 10, 2022