TensorFlow Metal Backend on Apple Silicon Experiments (just for fun)

Overview

tf-metal-experiments

TensorFlow Metal Backend on Apple Silicon Experiments (just for fun)

Setup

This is tested on M1 series Apple Silicon SOC only.

TensorFlow 2.x

  1. Follow the official instructions from Apple here
  2. Test that your Metal GPU is working by running tf.config.list_physical_devices("GPU"), you should see 1 GPU present (it is not named). Later when you actually use the GPU, there will be a more informative printout that says Metal device set to: Apple M1 Max or similar.
  3. Now you should be ready to run any TF code that doesn't require external libraries.

HuggingFace Transformers library

If you want to play around with Transformer models (with TF Metal backend of course), you will need to install the HuggingFace Transformers library.

  1. Install the regex library (I don't know why it has to be like this, but yeah): python3 -m pip install --upgrade regex --no-use-pep517. You might need do xcode-select --install if the above command doesn't work.
  2. pip install transfomers ipywidgets

Experiments and Benchmarks

After some trial and error, some initial benchmarks for what should be the approx best capability of the M1 Max. For all the cases here, increasing batch size does not seem to increase the throughput.

Power draw also doesn't seem to be able to exceed 40W. Power draw from the GPU (averaged over 1 second) can be measured with sudo powermetrics --samplers gpu_power -i1000 -n1.

Model GPU BatchSize Throughput Power Memory
ResNet50 M1 Max 32c 64 135 img/sec 40W 13 GB
MobileNetV2 M1 Max 32c 128 352 img/sec 37W 15 GB
DistilBERT M1 Max 32c 64 120 seq/sec 35W 9 GB
BERTLarge M1 Max 32c 32 18 seq/sec 36W 14 GB

The benchmark scripts used are included in this repo.

Reference Benchmarks from RTX 3090

Model GPU BatchSize Throughput Power
ResNet50 3090 64 957 img/sec 300W
MobileNetV2 3090 128 1927 img/sec 310W
DistilBERT 3090 64 1040 seq/sec 310W
BERTLarge 3090 32 164 seq/sec 320W

For 3090, same script is used, but additional optimization that leverage hardware (Tensor Core) and software (XLA compiler) not present/working on M1 is added. This corresponds to the following code segment added:

from tensorflow.keras import mixed_precision
tf.config.optimizer.set_jit(True)
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_global_policy(policy)
physical_devices = tf.config.list_physical_devices('GPU')

Also note that the 3090 is likely to perform better at larger batch sizes.

Measuring Achievable TFLOPS

We can use TF to write a matrix multiplication benchmark to try and estimate what is the max compute performance we can get out of a M1 Max. It seems we can get around ~8 TFLOPS for large enough problem (GEMM) sizes.

The plot can be generated using tflops_sweep.py.

Note that FP64 and FP16 performance appears to be non-existent. (the code automatically runs on CPU if FP64 or FP16 is specified as data type)

Owner
Timothy Liu
Deep Learning stuff and Open Source Enthusiast @OpenSUTD
Timothy Liu
Source code of "Hold me tight! Influence of discriminative features on deep network boundaries"

Hold me tight! Influence of discriminative features on deep network boundaries This is the source code to reproduce the experiments of the NeurIPS 202

EPFL LTS4 19 Dec 10, 2021
百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline

项目说明: 百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline 比赛链接:https://aistudio.baidu.com/aistudio/competition/detail/66?isFromLuge=true 官方的baseline版本是基于paddlepadd

周俊贤 54 Nov 23, 2022
Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

peng gao 42 Nov 26, 2022
Python Auto-ML Package for Tabular Datasets

Tabular-AutoML AutoML Package for tabular datasets Tabular dataset tuning is now hassle free! Run one liner command and get best tuning and processed

Sagnik Roy 18 Nov 20, 2022
Prototypical Networks for Few shot Learning in PyTorch

Prototypical Networks for Few shot Learning in PyTorch Simple alternative Implementation of Prototypical Networks for Few Shot Learning (paper, code)

Orobix 835 Jan 08, 2023
Fast convergence of detr with spatially modulated co-attention

Fast convergence of detr with spatially modulated co-attention Usage There are no extra compiled components in SMCA DETR and package dependencies are

peng gao 135 Dec 07, 2022
NEO: Non Equilibrium Sampling on the orbit of a deterministic transform

NEO: Non Equilibrium Sampling on the orbit of a deterministic transform Description of the code This repo describes the NEO estimator described in the

0 Dec 01, 2021
Learning Neural Painters Fast! using PyTorch and Fast.ai

The Joy of Neural Painting Learning Neural Painters Fast! using PyTorch and Fast.ai Blogpost with more details: The Joy of Neural Painting The impleme

Libre AI 72 Nov 10, 2022
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model Edresson Casanova, Christopher Shulby, Eren Gölge, Nicolas Michael Müller, Frede

Edresson Casanova 92 Dec 09, 2022
Code for the paper "Combining Textual Features for the Detection of Hateful and Offensive Language"

The repository provides the source code for the paper "Combining Textual Features for the Detection of Hateful and Offensive Language" submitted to HA

Sherzod Hakimov 3 Aug 04, 2022
PyTorch implementation of Algorithm 1 of "On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models"

Code for On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models This repository will reproduce the main results from our pape

Mitch Hill 32 Nov 25, 2022
Music source separation is a task to separate audio recordings into individual sources

Music Source Separation Music source separation is a task to separate audio recordings into individual sources. This repository is an PyTorch implmeme

Bytedance Inc. 958 Jan 03, 2023
Predict Breast Cancer Wisconsin (Diagnostic) using Naive Bayes

Naive-Bayes Predict Breast Cancer Wisconsin (Diagnostic) using Naive Bayes Downloading Data Set Use our Breast Cancer Wisconsin Data Set Also you can

Faeze Habibi 0 Apr 06, 2022
The official PyTorch implementation for NCSNv2 (NeurIPS 2020)

Improved Techniques for Training Score-Based Generative Models This repo contains the official implementation for the paper Improved Techniques for Tr

174 Dec 26, 2022
Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Scribble-Supervised LiDAR Semantic Segmentation Dataset and code release for the paper Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORA

102 Dec 25, 2022
Transformers based fully on MLPs

Awesome MLP-based Transformers papers An up-to-date list of Transformers based fully on MLPs without attention! Why this repo? After transformers and

Fawaz Sammani 35 Dec 30, 2022
Official implementation of NeurIPS'21: Implicit SVD for Graph Representation Learning

isvd Official implementation of NeurIPS'21: Implicit SVD for Graph Representation Learning If you find this code useful, you may cite us as: @inprocee

Sami Abu-El-Haija 16 Jan 08, 2023
Kohei's 5th place solution for xview3 challenge

xview3-kohei-solution Usage This repository assumes that the given data set is stored in the following locations: $ ls data/input/xview3/*.csv data/in

Kohei Ozaki 2 Jan 17, 2022
TSIT: A Simple and Versatile Framework for Image-to-Image Translation

TSIT: A Simple and Versatile Framework for Image-to-Image Translation This repository provides the official PyTorch implementation for the following p

Liming Jiang 255 Nov 23, 2022
Easy-to-use library to boost AI inference leveraging state-of-the-art optimization techniques.

NEW RELEASE How Nebullvm Works • Tutorials • Benchmarks • Installation • Get Started • Optimization Examples Discord | Website | LinkedIn | Twitter Ne

Nebuly 1.7k Dec 31, 2022