Efficient Deep Learning Systems

This repository contains materials for the Efficient Deep Learning Systems course taught at the Faculty of Computer Science of HSE University and Yandex School of Data Analysis.

Syllabus

Week 1: Introduction
- Lecture: Course overview and organizational details. Core concepts of the GPU architecture and CUDA API.
- Seminar: CUDA operations in PyTorch. Introduction to benchmarking.
Week 2: Basics of distributed ML
- Lecture: Introduction to distributed training. Process-based communication. Parameter Server architecture.
- Seminar: Multiprocessing basics. Parallel GloVe training.
Week 3: Data-parallel training and All-Reduce
- Lecture: Data-parallel training of neural networks. All-Reduce and its efficient implementations.
- Seminar: Introduction to PyTorch Distributed. Data-parallel training primitives.
Week 4: Memory-efficient and model-parallel training
Week 5: Profiling DL code, training-time optimizations
Week 6: Basics of Python application deployment
Week 7: Software for serving neural networks
Week 8: Optimizing models for faster inference
Week 9: Experiment tracking, model and data versioning
Week 10: Testing, debugging and monitoring of models

Grading

There will be a total of 4 home assignments (some of them spread over several weeks). The final grade is a weighted sum of per-assignment grades. Please refer to the course page of your institution for details.

Efficient Deep Learning Systems course

Related tags

Overview

Efficient Deep Learning Systems

Syllabus

Grading

Staff

Owner

Max Ryabinin

Learning a mapping from images to psychological similarity spaces with neural networks.

Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

Continual Learning of Long Topic Sequences in Neural Information Retrieval

CTC segmentation python package

for taichi voxel-challange event

GUI for TOAD-GAN, a PCG-ML algorithm for Token-based Super Mario Bros. Levels.

The repository contains reproducible PyTorch source code of our paper Generative Modeling with Optimal Transport Maps, ICLR 2022.

Collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning.

A PyTorch Implementation of Neural IMage Assessment

QueryInst: Parallelly Supervised Mask Query for Instance Segmentation

EMNLP'2021: SimCSE: Simple Contrastive Learning of Sentence Embeddings

Power Core Simulator!

FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

Code for Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

RepVGG: Making VGG-style ConvNets Great Again

a delightful machine learning tool that allows you to train, test and use models without writing code

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

Custom studies about block sparse attention.

Code to compute permutation and drop-column importances in Python scikit-learn models