SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Last update: Dec 16, 2022

Related tags

Overview

SLIDE

The SLIDE package contains the source code for reproducing the main experiments in this paper.

Dataset

The Datasets can be downloaded in Amazon-670K. Note that the data is sorted by labels so please shuffle at least the validation/testing data.

TensorFlow Baselines

We suggest directly get TensorFlow docker image to install TensorFlow-GPU. For TensorFlow-CPU compiled with AVX2, we recommend using this precompiled build.

Also there is a TensorFlow docker image specifically built for CPUs with AVX-512 instructions, to get it use:

docker pull clearlinux/stacks-dlrs_2-mkl

config.py controls the parameters of TensorFlow training like learning rate. example_full_softmax.py, example_sampled_softmax.py are example files for Amazon-670K dataset with full softmax and sampled softmax respectively.

Build/Run on Intel platform

Prerequisites:

CMake >= 3.0 Intel Compiler (ICC) >= 19

Build with ICC compiler

source /opt/intel/compilers_and_libraries/linux/bin/compilervars.sh -arch intel64 -platform linux
cd /path/to/slide-root
mkdir -p bin && cd bin 
# BDW (AVX2)
cmake .. -DCMAKE_CXX_COMPILER=icpc -DCMAKE_C_COMPILER=icc
# SKX/CLX (AVX512)
cmake .. -DCMAKE_CXX_COMPILER=icpc -DCMAKE_C_COMPILER=icc -DOPT_AVX512=1
# CPX (AVX512 + BF16)
cmake .. -DCMAKE_CXX_COMPILER=icpc -DCMAKE_C_COMPILER=icc -DOPT_AVX512=1 -DOPT_AVX512_BF16=1
make -j

Run on Intel SKX/CLX/CPX

cd bin
OMP_NUM_THREADS= KMP_HW_SUBSET=s,c,t KMP_AFFINITY=compact,granularity=fine KMP_BLOCKTIME=200 ./runme ../SLIDE/Config_amz.csv
For example, on CLX8280 2Sx28c:
OMP_NUM_THREADS=112 KMP_HW_SUBSET=2s,28c,2t KMP_AFFINITY=compact,granularity=fine KMP_BLOCKTIME=200 ./runme ../SLIDE/Config_amz.csv

For best performance please set Batchsize=multiple-of-logic-core-number from SLIDE/Config_amz.csv.

Results can be checked from the log file under dataset:

tail -f dataset/log.txt

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Related tags

Overview

SLIDE

Dataset

TensorFlow Baselines

Build/Run on Intel platform

Prerequisites:

Build with ICC compiler

Run on Intel SKX/CLX/CPX

Owner

Intel Labs

Source code for the Paper: CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints}

codebase for "A Theory of the Inductive Bias and Generalization of Kernel Regression and Wide Neural Networks"

Official code repository for the EMNLP 2021 paper

VOS: Learning What You Don’t Know by Virtual Outlier Synthesis

Implementation of Google Brain's WaveGrad high-fidelity vocoder

Dirty Pixels: Towards End-to-End Image Processing and Perception

A FAIR dataset of TCV experimental results for validating edge/divertor turbulence models.

SOLOv2 on onnx & tensorRT

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

TalkingHead-1KH is a talking-head dataset consisting of YouTube videos

Weakly Supervised Segmentation by Tensorflow.

An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

This is an example implementation of the paper "Cross Domain Robot Imitation with Invariant Representation".

Improving XGBoost survival analysis with embeddings and debiased estimators

UNet model with VGG11 encoder pre-trained on Kaggle Carvana dataset

A Dying Light 2 (DL2) PAKFile Utility for Modders and Mod Makers.

Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)

A PyTorch implementation of "Signed Graph Convolutional Network" (ICDM 2018).

Reference code for the paper CAMS: Color-Aware Multi-Style Transfer.

an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch