DLFlow is a deep learning framework.

Overview

contributions license python

DLFlow - A Deep Learning WorkFlow

DLFlow概述

DLFlow是一套深度学习pipeline,它结合了Spark的大规模特征处理能力和Tensorflow模型构建能力。利用DLFlow可以快速处理原始特征、训练模型并进行大规模分布式预测,十分适合离线环境下的生产任务。利用DLFlow,用户只需专注于模型开发,而无需关心原始特征处理、pipeline构建、生产部署等工作。

功能支持

配置驱动: DLFlow通过配置驱动,修改配置可以快速更换特征、模型超参数、任务流程等等,极大提高工作效率。

模块化结构: 任务和模型以插件形式存在,便于使用和开发,用户可以可以轻地将自定义任务和模型注册到框架内使用。

任务自组织: 通过内置的Workflow框架,根据任务的产出标记自动解决任务依赖,轻松构建深度学习pipeline。

最佳实践: 融入滴滴用户画像团队深度学习离线任务的最佳实践,有效应对离线生产中的多种问题。将Tensorflow和Spark进行合理结合,更适合离线深度学习任务。

快速开始

环境准备

首先请确保环境中已经安装和配置Hadoop和Spark,并设置好了基本的环境变量。

  • Tensorflow访问HDFS

为了能够使用让Tensorflow访问HDFS,需要确保如下环境变量生效:

# 确保libjvm.so被添加到LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${JAVA_HOME}/jre/lib/amd64/server

# 确保hadoop jars被添加到CLASSPATH
export CLASSPATH=${CLASSPATH}:$(hadoop classpath --glob)

关于Tensorflow访问HDFS更多内容请参见 TensorFlow on Hadoop

  • Spark读写TFReocrds
# Clone tensorflow/ecosystem项目
git clone https://github.com/tensorflow/ecosystem.git

cd ecosystem/spark/spark-tensorflow-connector/

# 构建spark-tensorflow-connector
mvn versions:set -DnewVersion=1.14.0
mvn clean install

项目构建后生成 target/spark-tensorflow-connector_2.11-1.14.0.jar,后续需要确保该jar被添加到 spark.jars 中。 关于Spark读写TFRecoreds更多内容请参见 spark-tensorflow-connector

安装

通过pip安装:

pip install dlflow

通过源代码安装:

git clone  https://github.com/didi/dlflow.git
cd dlflow
python setup.py install

使用

  • 配置文件

运行配置可参考 conf 目录中的配置。 关于配置详情请参考 配置说明

  • 以模块运行
python -m dlflow.main --config <CONFIGURATION FILE>.conf
  • 以脚本运行

确保python环境的 bin 目录已经被添加到环境变量 PATH

export PATH=$PATH:/usr/local/python/bin

之后通过如下命令运行

dlflow --config .conf

更详细的使用参见 使用说明

预定义任务

预定义任务 描述
Merge 特征融合任务,请参见 特征融合
Encode 解析原始特征,对特征进行编码和预处理,生成能够直接输入模型的特征
Train 模型训练任务
Evaluate 模型评估任务
Predict 模型预测任务,使用Spark进行分布式预测,具备处理大规模数据能力

手册目录

技术方案

DLFlow整体架构

整体架构

DLFLow pipeline

Pipeline

Contributing

欢迎使用并参与到本项目的建设中,详细内容请参见 Contribution Guide

License

DLFlow 基于Apache-2.0协议进行分发和使用,更多信息参见 LICENSE

Owner
DiDi
滴滴出行
DiDi
GDSC-ML Team Interview Task

GDSC-ML-Team---Interview-Task Task 1 : Clean or Messy room In this task we have to classify the given test images as clean or messy. - Link for datase

Aayush. 1 Jan 19, 2022
Fantasy Points Prediction and Dream Team Formation

Fantasy-Points-Prediction-and-Dream-Team-Formation Collected Data from open source resources that have over 100 Parameters for predicting cricket play

Akarsh Singh 2 Sep 13, 2022
A curated list of awesome Deep Learning tutorials, projects and communities.

Awesome Deep Learning Table of Contents Books Courses Videos and Lectures Papers Tutorials Researchers Websites Datasets Conferences Frameworks Tools

Christos 20k Jan 05, 2023
OptaPlanner wrappers for Python. Currently significantly slower than OptaPlanner in Java or Kotlin.

OptaPy is an AI constraint solver for Python to optimize the Vehicle Routing Problem, Employee Rostering, Maintenance Scheduling, Task Assignment, School Timetabling, Cloud Optimization, Conference S

OptaPy 211 Jan 02, 2023
Code for CPM-2 Pre-Train

CPM-2 Pre-Train Pre-train CPM-2 此分支为110亿非 MoE 模型的预训练代码,MoE 模型的预训练代码请切换到 moe 分支 CPM-2技术报告请参考link。 0 模型下载 请在智源资源下载页面进行申请,文件介绍如下: 文件名 描述 参数大小 100000.tar

Tsinghua AI 136 Dec 28, 2022
A best practice for tensorflow project template architecture.

A best practice for tensorflow project template architecture.

Mahmoud Gamal Salem 3.6k Dec 22, 2022
Replication of Pix2Seq with Pretrained Model

Pretrained-Pix2Seq We provide the pre-trained model of Pix2Seq. This version contains new data augmentation. The model is trained for 300 epochs and c

peng gao 51 Nov 22, 2022
Uni-Fold: Training your own deep protein-folding models

Uni-Fold: Training your own deep protein-folding models. This package provides an implementation of a trainable, Transformer-based deep protein foldin

DP Technology 187 Jan 04, 2023
ZeroGen: Efficient Zero-shot Learning via Dataset Generation

ZEROGEN This repository contains the code for our paper “ZeroGen: Efficient Zero

Jiacheng Ye 31 Dec 30, 2022
This is the official implementation code repository of Underwater Light Field Retention : Neural Rendering for Underwater Imaging (Accepted by CVPR Workshop2022 NTIRE)

Underwater Light Field Retention : Neural Rendering for Underwater Imaging (UWNR) (Accepted by CVPR Workshop2022 NTIRE) Authors: Tian Ye†, Sixiang Che

jmucsx 17 Dec 14, 2022
Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments

Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments Paper: arXiv (ICRA 2021) Video : https://youtu.be/CC

Sachini Herath 68 Jan 03, 2023
Code for our paper "Interactive Analysis of CNN Robustness"

Perturber Code for our paper "Interactive Analysis of CNN Robustness" Datasets Feature visualizations: Google Drive Fine-tuning checkpoints as saved m

Stefan Sietzen 0 Aug 17, 2021
Federated learning on graph, especially on graph neural networks (GNNs), knowledge graph, and private GNN.

Federated learning on graph, especially on graph neural networks (GNNs), knowledge graph, and private GNN.

keven 198 Dec 20, 2022
Streamlit App For Product Analysis - Streamlit App For Product Analysis

Streamlit_App_For_Product_Analysis Здравствуйте! Перед вами дашборд, позволяющий

Grigory Sirotkin 1 Jan 10, 2022
Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Memory-Efficient Multi-Level In-Situ Generation (MLG) By Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray T. Chen and David Z. Pan

Jiaqi Gu 2 Jan 04, 2022
Optimize Trading Strategies Using Freqtrade

Optimize trading strategy using Freqtrade Short demo on building, testing and optimizing a trading strategy using Freqtrade. The DevBootstrap YouTube

DevBootstrap 139 Jan 01, 2023
Redash reset for python

redash-reset This will use a default REDASH_SECRET_KEY key of c292a0a3aa32397cdb050e233733900f this allows you to reset the password of the user ID bu

Robert Wiggins 5 Nov 14, 2022
Bayesian Image Reconstruction using Deep Generative Models

Bayesian Image Reconstruction using Deep Generative Models R. Marinescu, D. Moyer, P. Golland For technical inquiries, please create a Github issue. F

Razvan Valentin Marinescu 51 Nov 23, 2022
Yolov5-opencv-cpp-python - Example of using ultralytics YOLO V5 with OpenCV 4.5.4, C++ and Python

yolov5-opencv-cpp-python Example of performing inference with ultralytics YOLO V

183 Jan 09, 2023
Lucid library adapted for PyTorch

Lucent PyTorch + Lucid = Lucent The wonderful Lucid library adapted for the wonderful PyTorch! Lucent is not affiliated with Lucid or OpenAI's Clarity

Lim Swee Kiat 520 Dec 26, 2022