Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.

Last update: Nov 18, 2022

Overview

opt-einsum-torch

There have been many implementations of Einstein's summation. numpy's numpy.einsum is the least efficient one as it only runs in single thread on CPU. PyTorch's torch.einsum works for both CPU and CUDA tensors. However, since there is no virtual CUDA memory, torch.einsum will run out of CUDA memory for large tensors.

This code aims at implementing a memory-efficient einsum function using PyTorch as the backend. This code also uses the opt_einsum package to optimizes the contraction path to achieve the minimal FLOPS.

Usage

from opt_einsum_torch import EinsumPlanner
import torch

# Some huge tensors
arr1, arr2 = ..., ...
ee = EinsumPlanner(torch.device('cuda:0'), cuda_mem_limit=0.9)
result = ee.einsum('ijk,jkl->il', arr1, arr2)

The resulting tensor result will be a PyTorch CPU tensor. You could convert it into numpy array by simply calling result.numpy().

Future works

Support multiple GPUs.
Memory efficient einsum kernels.
CUDA data transfer profilers.

You might also like...

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

RMNet This repository contains the source code for the paper Efficient Regional Memory Network for Video Object Segmentation. Cite this work @inprocee

76 Dec 14, 2022

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [a

456 Dec 12, 2022

A memory-efficient implementation of DenseNets

efficient_densenet_pytorch A PyTorch =1.0 implementation of DenseNets, optimized to save GPU memory. Recent updates Now works on PyTorch 1.0! It uses

1.4k Dec 25, 2022

InvTorch: memory-efficient models with invertible functions

InvTorch: Memory-Efficient Invertible Functions This module extends the functionality of torch.utils.checkpoint.checkpoint to work with invertible fun

12 May 12, 2022

Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Memory-Efficient Multi-Level In-Situ Generation (MLG) By Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray T. Chen and David Z. Pan

2 Jan 4, 2022

This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations at CVPR'21. According to some product reasons, we are not planning to release the training/testing codes and models. However, we will release the dataset and the scripts to prepare the dataset.

TransFill-Reference-Inpainting This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transf

80 Dec 8, 2022

Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

Releases(0.1.0)

0.1.0(Dec 30, 2021)

Initial release of the package.
Source code(tar.gz)
Source code(zip)

Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.

Related tags

Overview

opt-einsum-torch

Usage

Future works

You might also like...

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

A memory-efficient implementation of DenseNets

InvTorch: memory-efficient models with invertible functions

Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

GNPy: Optical Route Planning and DWDM Network Optimization

Releases(0.1.0)

0.1.0(Dec 30, 2021)

Owner

Haoyan Huo

Implémentation en pyhton de l'article Depixelizing pixel art de Johannes Kopf et Dani Lischinski

Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset (CVPR'19)

PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation

Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

Few-Shot-Intent-Detection includes popular challenging intent detection datasets with/without OOS queries and state-of-the-art baselines and results.

Retina blood vessel segmentation with a convolutional neural network

Pytorch Implementation of "Diagonal Attention and Style-based GAN for Content-Style disentanglement in image generation and translation" (ICCV 2021)

Classification of ecg datas for disease detection

Hcaptcha-challenger - Gracefully face hCaptcha challenge with Yolov5(ONNX) embedded solution

NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥

Read number plates with https://platerecognizer.com/

The backbone CSPDarkNet of YOLOX.

Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions (ICML 2020)

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

PyTorch implementation for 3D human pose estimation

Employee-Managment - Company employee registration software in the face recognition system

General Assembly Capstone: NBA Game Predictor

Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond

An implementation of the WHATWG URL Standard in JavaScript