PlaidML is a framework for making deep learning work everywhere.

Last update: Jan 02, 2023

Related tags

Deep Learning plaidml

Overview

A platform for making deep learning work everywhere.

To Our Users

First off, we’d like to thank you for choosing PlaidML. Whether you’re a new user or a multi-year veteran, we greatly appreciate you for the time you’ve spent tinkering around with our source code, sending us feedback, and improving our codebase. PlaidML would truly not be the same without you.

The feedback we have received from our users indicates an ever-increasing need for performance, programmability, and portability. During the past few months, we have been restructuring PlaidML to address those needs. Below is a summary of the biggest changes:

We’ve adopted MLIR, an extensible compiler infrastructure that has gained industry-wide adoption since its release in early 2019. MLIR makes it easier to integrate new software and hardware into our compiler stack, as well as making it easier to write optimizations for our compiler.
We’ve worked extensively on Stripe, our low-level intermediate representation within PlaidML. Stripe contains optimizations that greatly improve the performance of our compiler. While our work on Stripe began before we decided to use MLIR, we are in the process of fully integrating Stripe into MLIR.
We created our C++/Python embedded domain-specific language (EDSL) to improve the programmability of PlaidML.

Today, we’re announcing a new branch of PlaidML — plaidml-v1. This will act as our development branch going forward and will allow us to more rapidly prototype the changes we’re making without breaking our existing user base. As a precaution, please note that certain features, tests, and hardware targets may be broken in plaidml-v1.

You can continue to use code on the master branch or from our releases on PyPI. For your convenience, the contents of our master branch will be released as version 0.7.0. We are keeping the master branch of PlaidML stable and maintaining it until plaidml-v1 is ready for production.

If you’d like to try out some of PlaidML’s newer performance improvements, you can try running PlaidML with the environment variable PLAIDML_USE_STRIPE=1. This will act as a precursor to the changes you’ll be seeing in plaidml-v1, and we’re excited to hear your feedback on Stripe.

Your support means a lot to us. Thank you for being understanding of our new development process during this new and exciting time for deep learning compilers.

PlaidML is an advanced and portable tensor compiler for enabling deep learning on laptops, embedded devices, or other devices where the available computing hardware is not well supported or the available software stack contains unpalatable license restrictions.

PlaidML sits underneath common machine learning frameworks, enabling users to access any hardware supported by PlaidML. PlaidML supports Keras, ONNX, and nGraph.

As a component within the nGraph Compiler stack, PlaidML further extends the capabilities of specialized deep-learning hardware (especially GPUs,) and makes it both easier and faster to access or make use of subgraph-level optimizations that would otherwise be bounded by the compute limitations of the device.

As a component under Keras, PlaidML can accelerate training workloads with customized or automatically-generated Tile code. It works especially well on GPUs, and it doesn't require use of CUDA/cuDNN on Nvidia hardware, while achieving comparable performance.

PlaidML works on all major operating systems: Linux, macOS, and Windows.

If you are using a hardware target not supported by PlaidML by default, such as Clover, check out the instructions at building PlaidML to build a custom configuration to support your hardware.

Prerequisites

Python (v2 supported, v3 recommended)
OpenCL 1.2 or greater

Quick Start

See the troubleshooting section for solutions to common issues.

virtualenv plaidml
source plaidml/bin/activate
pip install plaidml-keras plaidbench

Choose which accelerator you'd like to use (many computers, especially laptops, have multiple):

plaidml-setup

Next, try benchmarking MobileNet inference performance:

plaidbench keras mobilenet

Or, try training MobileNet:

plaidbench --batch-size 16 keras --train mobilenet

Installation Instructions

We support a variety of operating systems and installation methods.

Demos and Related Projects

Plaidbench

Plaidbench is a performance testing suite designed to help users compare the performance of different cards and different frameworks.

Hello VGG

One of the great things about Keras is how easy it is to play with state of the art networks. Here's all the code you need to run VGG-19:

#!/usr/bin/env python

import numpy as np
import os
import time

os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"

import keras
import keras.applications as kapp
from keras.datasets import cifar10

(x_train, y_train_cats), (x_test, y_test_cats) = cifar10.load_data()
batch_size = 8
x_train = x_train[:batch_size]
x_train = np.repeat(np.repeat(x_train, 7, axis=1), 7, axis=2)
model = kapp.VGG19()
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])

print("Running initial batch (compiling tile program)")
y = model.predict(x=x_train, batch_size=batch_size)

# Now start the clock and run 10 batches
print("Timing inference...")
start = time.time()
for i in range(10):
    y = model.predict(x=x_train, batch_size=batch_size)
print("Ran in {} seconds".format(time.time() - start))

Reporting Issues

Either open a ticket on GitHub or join our slack channel (#plaidml).

CI & Validation

Validated Hardware

A comprehensive set of tests for each release are run against the hardware targets listed below.

AMD
- R9 Nano
- RX 480
- Vega 10
Intel
- HD4000
- HD Graphics 505
NVIDIA
- K80
- GT 640M
- GTX 1050
- GTX 1070

Validated Networks

We support all of the Keras application networks from current versions of 2.x. Validated networks are tested for performance and correctness as part of our continuous integration system.

CNNs
- Inception v3
- ResNet50
- VGG19
- Xception
- MobileNet
- DenseNet
- ShuffleNet
LSTM
- examples/imdb_lstm.py (from keras)

Comments

[macOS] model.fit() loss: nan

Ran mnist_cnn.py from keras/examples after adding plaidml as the backend. This issue affects many others, but this is the simplest example.

Will run fine for a while, then loss will hit nan and acc will plummet until it hits 0, where it stays.

Andys-iMac-2:examples andy$ python mnist_cnn.py x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples INFO:plaidml:Opening device "amd_radeon_pro_580_compute_engine.0 Train on 60000 samples, validate on 10000 samples Epoch 1/12 59776/60000 [============================>.] - ETA: 0s - loss: 0.3177 - acc: 0.9025INFO:plaidml:Analyzing Ops: 85 of 285 operations complete 60000/60000 [==============================] - 27s - loss: 0.3172 - acc: 0.9026 - val_loss: 0.2699 - val_acc: 0.9217 Epoch 2/12 60000/60000 [==============================] - 18s - loss: 0.1104 - acc: 0.9666 - val_loss: 0.2247 - val_acc: 0.9308 Epoch 3/12 60000/60000 [==============================] - 19s - loss: nan - acc: 0.5408 - val_loss: nan - val_acc: 0.0000e+00 Epoch 4/12 60000/60000 [==============================] - 19s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 5/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 6/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 7/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 8/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 9/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 10/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 11/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 12/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Test loss: nan Test accuracy: 0.0

opened by andyoneal 28

trying to implement ReflectionPadding2D

finally I implemented it in one op for B,H,W,C

class ReflectionPadding2D(PMLTile.Operation):
    def __init__(self, input, h_pad, w_pad):
        if K.image_data_format() == 'channels_last':
            if input.shape.ndims == 4:
                H, W = input.shape.dims[1:3]
                if (type(H) == int and h_pad >= H) or \
                   (type(W) == int and w_pad >= W):
                    raise ValueError("Paddings must be less than dimensions.")
                c = """ function (I[B, H, W, C] ) -> (O) {{
                        WE = W + {w_pad}*2;
                        HE = H + {h_pad}*2;
                    """.format(h_pad=h_pad, w_pad=w_pad)
                if w_pad > 0:
                    c += """
                        LEFT_PAD [b, h, w , c : B, H, WE, C ] = =(I[b, h, {w_pad}-w,            c]), w < {w_pad} ;
                        HCENTER  [b, h, w , c : B, H, WE, C ] = =(I[b, h, w-{w_pad},            c]), w < W+{w_pad}-1 ;
                        RIGHT_PAD[b, h, w , c : B, H, WE, C ] = =(I[b, h, 2*W - (w-{w_pad}) -2, c]);
                        LCR = LEFT_PAD+HCENTER+RIGHT_PAD;
                    """.format(h_pad=h_pad, w_pad=w_pad)
                else:
                    c += "LCR = I;"
                if h_pad > 0:
                    c += """
                        TOP_PAD   [b, h, w , c : B, HE, WE, C ] = =(LCR[b, {h_pad}-h,            w, c]), h < {h_pad};
                        VCENTER   [b, h, w , c : B, HE, WE, C ] = =(LCR[b, h-{h_pad},            w, c]), h < H+{h_pad}-1 ;
                        BOTTOM_PAD[b, h, w , c : B, HE, WE, C ] = =(LCR[b, 2*H - (h-{h_pad}) -2, w, c]);
                        TVB = TOP_PAD+VCENTER+BOTTOM_PAD;
                    """.format(h_pad=h_pad, w_pad=w_pad)
                else:
                    c += "TVB = LCR;"
                c += "O = TVB; }"
                inp_dims = input.shape.dims
                out_dims = (inp_dims[0], inp_dims[1]+h_pad*2, inp_dims[2]+w_pad*2, inp_dims[3])
            else:
                raise NotImplemented
        else:
            raise NotImplemented
        super(ReflectionPadding2D, self).__init__(c, [('I', input) ],
                [('O', PMLTile.Shape(input.shape.dtype, out_dims ) )])

also I implemented it via slice and concat but I suppose it will consume more VRAM for this? or am I wrong??

class ReflectionPadding2D():
    def __init__(self, h_pad, w_pad):
        self.h_pad, self.w_pad = h_pad, w_pad
    def __call__(self, inp):
        h_pad, w_pad = self.h_pad, self.w_pad
        if K.image_data_format() == 'channels_last':
            if inp.shape.ndims == 4:
                w = K.concatenate ([ inp[:,:,w_pad:0:-1,:],
                                     inp,
                                     inp[:,:,-2:-w_pad-2:-1,:] ], axis=2 )
                h = K.concatenate ([ w[:,h_pad:0:-1,:,:],
                                     w,
                                     w[:,-2:-h_pad-2:-1,:,:] ], axis=1 )
                return h
            else:
                raise NotImplemented
        else:
            raise NotImplemented

needs integration

opened by iperov 27

plaidml.exceptions.PlaidMLError: Could not find PlaidML configuration file: "experimental.json".

Traceback (most recent call last): File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1264.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 193, in run_module_as_main "main", mod_spec) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1264.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\Scripts\plaidml-setup.exe_main.py", line 5, in File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\plaidml_init.py", line 50, in import plaidml.settings File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\plaidml\settings.py", line 33, in _setup_config('PLAIDML_EXPERIMENTAL_CONFIG', 'experimental.json') File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\plaidml\settings.py", line 30, in _setup_config 'Could not find PlaidML configuration file: "{}".'.format(filename)) plaidml.exceptions.PlaidMLError: Could not find PlaidML configuration file: "experimental.json".

opened by Duddino 26
Memory error on Vega 10

Hi I am trying plaid ml on AMD Vega 10 : gfx900

I get the following error:

[email protected]:~/biswa/plaidbench$ python plaidbench.py mobilenet Using PlaidML backend. INFO:plaidml:Initializing device gfx900.0: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Initializing device gfx900.1: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Initializing device gfx900.2: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Initializing device gfx900.3: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Opening device "gfx900.3": "Advanced Micro Devices, Inc. gfx900"

Model loaded. Compiling and running initial batch, batch_size=1 Warmup Memory access fault by GPU node-7 on address 0x4408bd6000. Reason: Page not pres ent or supervisor privilege. Aborted (core dumped)

Any idea how to resolve this?

Thanks, Biswa

opened by biswagsingh 26
"CL_OUT_OF_HOST_MEMORY" error when command "plaidml-setup"

Hello again, I'm experiencing a new issue with the 0.6.0 rc1 version of the plaidml. Using 0.5 led to this issue: https://github.com/plaidml/plaidml/issues/73. Any luck of solving it?

opened by iamkucuk 23
Feature request - port to Python 3.6
I've got PlaidML running on my AMD Bonaire on Arch Linux with Python 2.7 in a Conda environment. Every other Python package I have runs with 3.6 and my goal is to keep it that way. ;-)

There doesn't seem to even be a pip package for 3.6, so the pip install -U plaidml-keras fails with Python 3.6. If you can post build-from-GitHub-source instructions, I can make a local package and install it.

P.S.: Let me know if you want Arch setup instructions for AMD GPUs. Most of it is on the Arch User Repository wiki but I've got some scripts that do the work.

P.P.S.: Benchmark results

Using PlaidML backend. INFO:plaidml:Initializing device bonaire.0: "Bonaire", vendor "Advanced Micro Devices, Inc." INFO:plaidml:Opening device "bonaire.0": "Advanced Micro Devices, Inc. Bonaire" Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.6/mobilenet_1_0_224_tf.h5 16793600/17225924 [============================>.] - ETA: 0s Model loaded. Compiling and running initial batch, batch_size=1 Warmup Doing the main timing Example finished, elapsed: 6.821215868 (compile), 15.0223557949 (execution)
opened by znmeb 21
Mac+AMD: AMD not detected and Intel uses too high of a work group
iMac 2017 with a Radeon Pro 580 and a Core i5-7600K. Compiled and installed PlaidML from source. Installed via the pip wheel.

Ran plaidml-setup:

PlaidML Setup (0.0.0.dev0)

Thanks for using PlaidML!

Some Notes:

Bugs and other issues: https://github.com/plaidml/plaidml

Questions: https://stackoverflow.com/questions/tagged/plaidml

Say hello: https://groups.google.com/forum/#!forum/plaidml-dev

PlaidML is licensed under the GNU AGPLv3

Default Config Devices: No devices.

Experimental Config Devices: intel(r)_core(tm)i5-7600k_cpu@_3.80ghz.0 : Intel Intel(R) Core(TM) i5-7600K CPU @ 3.80GHz

Using experimental devices can cause poor performance, crashes, and other nastiness. Enable experimental device support? (y,n)[n]:y

PlaidML sends anonymous usage statistics to help guide improvements. We'd love your help making it better.

Enable telemetry reporting? (y,n)[y]:y

Almost done. Multiplying some matrices... Tile code: function (B[X,Z], C[Z,Y]) -> (A) { A[x,y : X,Y] = +(B[x,z] * C[z,y]); } ERROR:plaidml:OpenCL: [CL_INVALID_WORK_GROUP_SIZE] : OpenCL Error : clEnqueueNDRangeKernel failed: total work group size (32) is greater than the device can support (1) (cb=12) Whew. That worked.

Save settings to /Users/andy/.plaidml? (y,n)[y]:y Success!

Should a gpu be detected at this point? Is there somewhere I can lower total work group size manually?

New to submitting git issues. Sorry if I'm missing anything.
opened by andyoneal 19
PlaidML Setup Issue Windows
Hi, Running plaidml-setup gives me the following:
PlaidML Setup (0.3.5) Thanks for using PlaidML! Some Notes: Bugs and other issues: https://github.com/plaidml/plaidml Questions: https://stackoverflow.com/questions/tagged/plaidml Say hello: https://groups.google.com/forum/#!forum/plaidml-dev PlaidML is licensed under the GNU AGPLv3
No OpenCL devices found. Check driver installation. Read the helpful, easy driver installation instructions from our README: http://github.com/plaidml/plaidml

This is the output from clinfo: Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.1 AMD-APP (2766.5) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices

Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Board name: Radeon RX 580 Series Device Topology: PCI[ B#1, D#0, F#0 ] Max compute units: 36 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 256 Preferred vector width char: 4 Preferred vector width short: 2 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 4 Native vector width short: 2 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 1 Max clock frequency: 1340Mhz Address bits: 64 Max memory allocation: 4244635648 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 64 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 2048 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 8589934592 Constant buffer size: 4244635648 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Max pipe arguments: 16 Max pipe active reservations: 16 Max pipe packet size: 4244635648 Max global variable size: 3820172032 Max global variable preferred total size: 8589934592 Max read/write image args: 64 Max on device events: 1024 Queue on device max size: 8388608 Max on device queues: 1 Queue on device preferred size: 262144 SVM capabilities: Coarse grain buffer: Yes Fine grain buffer: Yes Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue on Host properties: Out-of-Order: No Profiling : Yes Queue on Device properties: Out-of-Order: Yes Profiling : Yes Platform ID: 00007FFEC2C66FD0 Name: Ellesmere Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 2.0 Driver version: 2766.5 Profile: FULL_PROFILE Version: OpenCL 2.0 AMD-APP (2766.5) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash cl_amd_planar_yuv

Shouldn't it be working? I just switched to a new computer, so I used to use NVIDIA with CUDA. Any help is appreciated!

Note: I do have the most recent AMD driver installed.
opened by YutaTakano 16

could not broadcast input array from shape (3,2048) into shape (6144)

I just installed plaidml and i tried to run this example:

#!/usr/bin/env python

import plaidml.keras
plaidml.keras.install_backend() 

import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.datasets import mnist
from keras.utils import np_utils

# fix a random seed for reproducibility
np.random.seed(9)

# user inputs
nb_epoch = 25
num_classes = 10
batch_size = 128
train_size = 60000
test_size = 10000
v_length = 784

# split the mnist data into train and test
(trainData, trainLabels), (testData, testLabels) = mnist.load_data()


# reshape the dataset
trainData = trainData.reshape(train_size, v_length)
testData = testData.reshape(test_size, v_length)
trainData = trainData.astype("float32")
testData = testData.astype("float32")
trainData /= 255
testData /= 255


# convert class vectors to binary class matrices --> one-hot encoding
mTrainLabels = np_utils.to_categorical(trainLabels, num_classes)
mTestLabels = np_utils.to_categorical(testLabels, num_classes)

# create the model
model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation("relu"))
model.add(Dropout(0.2))
model.add(Dense(256))
model.add(Activation("relu"))
model.add(Dropout(0.2))
model.add(Dense(num_classes))
model.add(Activation("softmax"))

# summarize the model
model.summary()

# compile the model
model.compile(loss="categorical_crossentropy",
			  optimizer="adam",
			  metrics=["accuracy"])

# fit the model
history = model.fit(trainData, 
				 	mTrainLabels,
					validation_data=(testData, mTestLabels),
					batch_size=batch_size,
					nb_epoch=nb_epoch,
					verbose=2)

# print the history keys


# evaluate the model
scores = model.evaluate(testData, mTestLabels, verbose=0)

# history plot for accuracy
plt.plot(history.history["acc"])
plt.plot(history.history["val_acc"])
plt.title("Model Accuracy")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend(["train", "test"], loc="upper left")
plt.show()

# history plot for accuracy
plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.title("Model Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend(["train", "test"], loc="upper left")
plt.show()

and I got this error

could not broadcast input array from shape (3,2048) into shape (6144)

Then I tried running Hello VGG example from plaidml github page and I got the same error.

I am using plaidml 0.3.4 on ubuntu in virtualenv and I am trying to run this code on rx 480.

Tnx for help.

opened by leon3428 16

plaidml.exceptions.Unknown: Duplicate updates

Setup:

sudo apt-get install clinfo
clinfo [sees 1080ti]
sudo pip install -U plaidml-keras
plaidml-setup
[insert before keras import:]
import plaidml.keras
plaidml.keras.install_backend()

But, intermediate problem:

 ImportError: No module named plaidml.keras
$ which python
/home/phobrain/anaconda2/bin//python

Fix:

sys.path.append('/usr/local/lib/python2.7/dist-packages/')
import plaidml.keras
plaidml.keras.install_backend()

'Real' issue being reported:

File "siaconv.py", line 919, in doit epochs=epochs) File "/home/phobrain/anaconda2/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper return func(*args, **kwargs) File "/home/phobrain/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1926, in fit_generator self._make_train_function() File "/home/phobrain/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 967, in _make_train_function **self._function_kwargs) File "/usr/local/lib/python2.7/dist-packages/plaidml/keras/backend.py", line 1718, in function return _Function(inputs, outputs, updates, name) File "/usr/local/lib/python2.7/dist-packages/plaidml/keras/backend.py", line 931, in init c.add_update(_plaidml_val(var), _plaidml_val(newval)) File "/usr/local/lib/python2.7/dist-packages/plaidml/init.py", line 1289, in add_update _lib().plaidml_add_composer_update(self, dest, src) File "/usr/local/lib/python2.7/dist-packages/plaidml/init.py", line 674, in _check_err self.raise_last_status() File "/usr/local/lib/python2.7/dist-packages/plaidml/library.py", line 136, in raise_last_status raise self.last_status() plaidml.exceptions.Unknown: Duplicate updates

model.fit_generator(
        myGen('data', tr_pairs, tr_y, batch_size, True),
        (len(tr_pairs)-1) / batch_size,
        validation_data=myGen('valid', te_pairs, te_y, batch_size, True),
        validation_steps=1,
        max_queue_size=2,
        workers=1,
        epochs=epochs)

Net:

KERNEL_INIT = 'glorot_normal'

    seq.add(Dense(dense_size, input_shape=input_shape,
                activation='relu', kernel_initializer=KERNEL_INIT))
    seq.add(BatchNormalization())
    seq.add(Dense((dense_size*2)/3,
            activation='relu',
            kernel_initializer=KERNEL_INIT))
    seq.add(Dropout(0.1, seed=SEED))
    seq.add(Dense(dense_size/4,
            activation='relu',
            kernel_initializer=KERNEL_INIT))
    seq.add(Dense((dense_size*2)/3,
            activation='relu',
            kernel_initializer=KERNEL_INIT))
    seq.add(Dense(dense_size,
            activation='relu',
            kernel_initializer=KERNEL_INIT))
    seq.add(Dense(512,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
    seq.add(Dense(256,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
    seq.add(Dense(128,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
    seq.add(Dense(256,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
    seq.add(Dense(128,
                activation='relu',
                kernel_initializer=KERNEL_INIT))

opened by phobrain 16

Plaidml not detecting Mali-T628 on ARM

Hi,

I've build plaidml 0.3.5 to use on Odroid XU4 with Mali-T628 GPU with debian stretch. I manage to install the wheel, when I run plaidml-setup, I get:

"No supported devices found. Run 'clinfo' and file an issue containing the full output."

However, with plaidml 0.3.0rc1 latest available with pip install plaidml, my devices can be configured and I have 2 mali-t628 reported. "experimental.json" appears quite similar in both cases.

Any clue with what I may have done wrong building plaidml ? (used basel 0.18.1 with --config linux_arm_32v7) or what change might explain 0.3.5 not recognizing my devices where 0.3.0rc1 did ?

Thanks

Here's my clinfo report:

Number of platforms 1 Platform Name ARM Platform Platform Vendor ARM Platform Version OpenCL 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory Platform Extensions function suffix ARM

Platform Name ARM Platform Number of devices 2 Device Name Mali-T628 Device Vendor ARM Device Vendor ID 0x6200010 Device Version OpenCL 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Driver Version 1.2 Device OpenCL C Version OpenCL C 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Device Type GPU Device Profile FULL_PROFILE Max compute units 4 Max clock frequency 600MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 4 Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 8 / 8 (cl_khr_fp16) float 4 / 4
double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 2090405888 (1.947GiB) Error Correction support No Max memory allocation 522601472 (498.4MiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size <printDeviceInfo:89: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30> Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Max 2D image size 65536x65536 pixels Max 3D image size 65536x65536x65536 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Global Local memory size 32768 (32KiB) Max constant buffer size 65536 (64KiB) Max number of constant args 8 Max size of kernel argument 1024 Queue properties
Out-of-order execution Yes Profiling Yes Prefer user sync for interop No Profiling timer resolution 1000ns Execution capabilities
Run OpenCL kernels Yes Run native kernels No printf() buffer size 1048576 (1024KiB) Built-in kernels
Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory

Device Name Mali-T628 Device Vendor ARM Device Vendor ID 0x6200010 Device Version OpenCL 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Driver Version 1.2 Device OpenCL C Version OpenCL C 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Device Type GPU Device Profile FULL_PROFILE Max compute units 2 Max clock frequency 600MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 4 Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 8 / 8 (cl_khr_fp16) float 4 / 4
double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 2090405888 (1.947GiB) Error Correction support No Max memory allocation 522601472 (498.4MiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size <printDeviceInfo:89: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30> Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Max 2D image size 65536x65536 pixels Max 3D image size 65536x65536x65536 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Global Local memory size 32768 (32KiB) Max constant buffer size 65536 (64KiB) Max number of constant args 8 Max size of kernel argument 1024 Queue properties
Out-of-order execution Yes Profiling Yes Prefer user sync for interop No Profiling timer resolution 1000ns Execution capabilities
Run OpenCL kernels Yes Run native kernels No printf() buffer size 1048576 (1024KiB) Built-in kernels
Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory

NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) ARM Platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [ARM] clCreateContext(NULL, ...) [default] Success [ARM] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (2) Platform Name ARM Platform Device Name Mali-T628 Device Name Mali-T628 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (2) Platform Name ARM Platform Device Name Mali-T628 Device Name Mali-T628

ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.11 ICD loader Profile OpenCL 2.1

opened by nitescuc 15
Tile language vs EDSL

In the last publicly available version (0.7.0) of PlaidML the Tile language was used to write operators. The latest documentation talks about a C++/Python EDSL being developed as well, however, documentation there is a bit lacking, and only C++ is presented, not any Python version. I was wondering whether that is meant to substitute the Tile language or will both of them be kept in the future? I used to have some test project built using the Tile language from Python, which used to work well for me, and I'd like to improve that, but was wondering if I should port it to the EDSL approach, or is it okay to stay with the Tile language approach.

Also, I was wondering if the v1 branch still supports the Tile language, and will it continue to do so when it is finally released? If I take the path of compiling it from source, and use that instead of the pip installable v0.7, will my project that uses it break or continue working?

Is there any idea of a time-line for the next public release? I have been following this promising project for a while, hoping that a new release will come sooner or later.

opened by gyenesvi 9
Capture affine.store op
Hello.

I try to perform stencil on the following code

affine.store %1, %arg2[%arg3, %arg4, %arg5, %arg6] : memref<?x?x?x?xi64> affine.yield %arg2 : memref<?x?x?x?xi64>

by using the matchPattern function:

matchPattern(yield, m_Op<AffineYieldOp>(m_Capture(&store, m_Op<AffineStoreOp>(m_Any(), m_Any())))

But it seems m_Capture function and m_Op function used in existing examples, such as StencilGEMM, can not be used to capture operation without a return val, like affine.store here. Can I just use existing structure to match this pattern and capture the affine.store op ?
opened by IsolatedMy 4
Stenciling of MAX/ADD for RN50

This patch fix the pass "--x86-stencil-tpp-unary" so that all the reduce patterns in RN50 get stenciled with correct TPP parameters and unary flags.

opened by ZhangMZh 0
Batch parallelization and allocs to alloca changes

This wip patch modifies scoped allocs to allocas using PromoteBuffersToStackPass as well as pxa localization pass. As of now, the first pass does not seem to be scoping allocs other than weights. On the other hand, pxa localization pass throws a segfault at runtime for threads>1. This patch also parallelizes layers along batch dimension barring those which don't have batch dimension as the outer loop's induction variable.
wip

opened by KavithaTipturMadhu 0
TPSS: parallelization directives

This patch adds support for parallelization directives to be specified in a file using the environment <PLAIDML_PARALLELIZATION_CONFIG_FILE>. This patch adds a rule parser which matches the shapes of convolution based on the equalities/inequalities in the config file and applies the rules that follow.
It is important to note that collapse directive also adds a parallelize directive by default and can only be applied to 2 loop levels corresponding to a perfect loop nest (validity of reordering of loops in order to support the requested order is not verified).
wip

opened by KavithaTipturMadhu 0
how to fix "cannot import name 'Iterable' from 'collections' when running test code from main page

Hey all,

I've decided to try some mL projects and since I have a amd gpu (5700xt) I decided to use Plaidml. On the main website theres a test code for VGG-19 and I'm trying to run it right now but I run into the error attached in the screenshot. I tried to simply change collections to collections.abc but it looks like python 3.10 already does that? I'm pretty stuck, any help would be appreciated. Thanks!

opened by KSTRTK 3

Releases(0.7.0)

0.7.0(Jan 16, 2020)

This release contains a number of bug fixes and improvements to performance in the stripe based backends.

It includes full Stripe backends for GPU & CPU for all major targets. Stripe can be used by setting PLAIDML_USE_STRIPE=1 and by ensuring that you pick experimental configs. Stripe backends are only on by default for Intel integrated graphics, as that is the platform with the most advanced Stripe backend.

This release also includes a very high performance experimental CPU backend that achieves nearly state-of-the-art results. It can be activated by setting PLAIDML_STRIPE_JIT=1 PLAIDML_USE_STRIPE=1 and by picking the llvm_cpu device.

We're hoping to get 1.0.0 out the door fast and provide everyone an easy stepping stone into the MLIR world.
Source code(tar.gz)
Source code(zip)
0.6.4(Aug 6, 2019)
This release includes:

Preview EDSL / plaid2 support

Initial MLIR integration

Several workarounds for AMD & iGPU Metal compiler issues

Fixes for several shape related issues

New keras backend functions contributed by the community

Other notes: Stripe based backends should exceed v0 backends for most targets. Enable experimental devices and set USE_STRIPE=1 to test.
Source code(tar.gz)
Source code(zip)
plaidbench-0.6.4-py2.py3-none-any.whl(9.64 MB)
plaidml-0.6.4-py2.py3-none-macosx_10_10_x86_64.whl(29.53 MB)
plaidml-0.6.4-py2.py3-none-manylinux1_x86_64.whl(30.59 MB)
plaidml-0.6.4-py2.py3-none-win_amd64.whl(22.50 MB)
plaidml_keras-0.6.4-py2.py3-none-any.whl(18.07 KB)
0.3.5(Sep 12, 2018)

See Changelog in README.md
Source code(tar.gz)
Source code(zip)
plaidml035.zip(41.78 MB)
0.3.3rc1(May 9, 2018)
New Metal HAL

New CUDA HAL

Source code(tar.gz)
Source code(zip)
0.3.0(Mar 28, 2018)
Release Notes:

Now supports ONNX 1.1.0 as a backend through onnx-plaidml

Preliminary support for LLVM. Currently only supports CPUs, and only on Linux and macOS. More soon.

Support for LSTMs & RNNs with static loop sizes, such as examples/imdb_lstm.py (from Keras)

Training networks with embeddings is especially slow (#96)

RNNs are only staticly sized if the input's sequence length is explicitly specified (#97)

Fixes bug related to embeddings (#92)

Adds a shared generic op library in python to make creating frontends easier

plaidml-keras now uses this library

Uses plaidml/toolchain for builds

Building for ARM easy (–-config=linux_arm_32v7)

Various fixes for bugs (#89)

Source code(tar.gz)
Source code(zip)
0.2.0(Feb 15, 2018)
Adds new op library and revamps python interface

Source code(tar.gz)
Source code(zip)
0.1.3(Nov 18, 2017)

Fixes several Keras issues Adds preliminary Windows support
Source code(tar.gz)
Source code(zip)
0.1.1(Oct 27, 2017)
Using pypi is the preferred mechanism for installing PlaidML.

Adds Python 3 support

Preview release of macOS

Source code(tar.gz)
Source code(zip)
plaidml-0.1.1.zip(6.88 MB)

PlaidML is a framework for making deep learning work everywhere.

Related tags

Overview

To Our Users

Prerequisites

Quick Start

Installation Instructions

Demos and Related Projects

Plaidbench

Hello VGG

Reporting Issues

CI & Validation

Validated Hardware

Validated Networks

Comments

Releases(0.7.0)

0.7.0(Jan 16, 2020)

0.6.4(Aug 6, 2019)

0.3.5(Sep 12, 2018)

0.3.3rc1(May 9, 2018)

0.3.0(Mar 28, 2018)

0.2.0(Feb 15, 2018)

0.1.3(Nov 18, 2017)

0.1.1(Oct 27, 2017)

Owner

PlaidML

Code for generating the figures in the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?"

A treasure chest for visual recognition powered by PaddlePaddle

A PyTorch-centric hybrid classical-quantum machine learning framework

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

A 10000+ hours dataset for Chinese speech recognition

Cascading Feature Extraction for Fast Point Cloud Registration (BMVC 2021)

Image augmentation library in Python for machine learning.

Self-supervised learning on Graph Representation Learning (node-level task)

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles

A tool to prepare websites grabbed with wget for local viewing.

Laplacian Score-regularized Concrete Autoencoders

AlphaNet Improved Training of Supernet with Alpha-Divergence

NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring

Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

PyTorch Implementation of DSB for Score Based Generative Modeling. Experiments managed using Hydra.

[CVPR 2021] Few-shot 3D Point Cloud Semantic Segmentation

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

Face and other object detection using OpenCV and ML Yolo