PlaidML is a framework for making deep learning work everywhere.

Related tags

Deep Learningplaidml
Overview

A platform for making deep learning work everywhere.

Documentation | Installation Instructions | Building PlaidML | Contributing | Troubleshooting | Reporting Issues

License Build status

To Our Users

First off, we’d like to thank you for choosing PlaidML. Whether you’re a new user or a multi-year veteran, we greatly appreciate you for the time you’ve spent tinkering around with our source code, sending us feedback, and improving our codebase. PlaidML would truly not be the same without you.

The feedback we have received from our users indicates an ever-increasing need for performance, programmability, and portability. During the past few months, we have been restructuring PlaidML to address those needs. Below is a summary of the biggest changes:

  • We’ve adopted MLIR, an extensible compiler infrastructure that has gained industry-wide adoption since its release in early 2019. MLIR makes it easier to integrate new software and hardware into our compiler stack, as well as making it easier to write optimizations for our compiler.
  • We’ve worked extensively on Stripe, our low-level intermediate representation within PlaidML. Stripe contains optimizations that greatly improve the performance of our compiler. While our work on Stripe began before we decided to use MLIR, we are in the process of fully integrating Stripe into MLIR.
  • We created our C++/Python embedded domain-specific language (EDSL) to improve the programmability of PlaidML.

Today, we’re announcing a new branch of PlaidML — plaidml-v1. This will act as our development branch going forward and will allow us to more rapidly prototype the changes we’re making without breaking our existing user base. As a precaution, please note that certain features, tests, and hardware targets may be broken in plaidml-v1.

You can continue to use code on the master branch or from our releases on PyPI. For your convenience, the contents of our master branch will be released as version 0.7.0. We are keeping the master branch of PlaidML stable and maintaining it until plaidml-v1 is ready for production.

If you’d like to try out some of PlaidML’s newer performance improvements, you can try running PlaidML with the environment variable PLAIDML_USE_STRIPE=1. This will act as a precursor to the changes you’ll be seeing in plaidml-v1, and we’re excited to hear your feedback on Stripe.

Your support means a lot to us. Thank you for being understanding of our new development process during this new and exciting time for deep learning compilers.


PlaidML is an advanced and portable tensor compiler for enabling deep learning on laptops, embedded devices, or other devices where the available computing hardware is not well supported or the available software stack contains unpalatable license restrictions.

PlaidML sits underneath common machine learning frameworks, enabling users to access any hardware supported by PlaidML. PlaidML supports Keras, ONNX, and nGraph.

As a component within the nGraph Compiler stack, PlaidML further extends the capabilities of specialized deep-learning hardware (especially GPUs,) and makes it both easier and faster to access or make use of subgraph-level optimizations that would otherwise be bounded by the compute limitations of the device.

As a component under Keras, PlaidML can accelerate training workloads with customized or automatically-generated Tile code. It works especially well on GPUs, and it doesn't require use of CUDA/cuDNN on Nvidia hardware, while achieving comparable performance.

PlaidML works on all major operating systems: Linux, macOS, and Windows.

If you are using a hardware target not supported by PlaidML by default, such as Clover, check out the instructions at building PlaidML to build a custom configuration to support your hardware.

Prerequisites

  • Python (v2 supported, v3 recommended)
  • OpenCL 1.2 or greater

Quick Start

See the troubleshooting section for solutions to common issues.

virtualenv plaidml
source plaidml/bin/activate
pip install plaidml-keras plaidbench

Choose which accelerator you'd like to use (many computers, especially laptops, have multiple):

plaidml-setup

Next, try benchmarking MobileNet inference performance:

plaidbench keras mobilenet

Or, try training MobileNet:

plaidbench --batch-size 16 keras --train mobilenet

Installation Instructions

We support a variety of operating systems and installation methods.

Demos and Related Projects

Plaidbench

Plaidbench is a performance testing suite designed to help users compare the performance of different cards and different frameworks.

Hello VGG

One of the great things about Keras is how easy it is to play with state of the art networks. Here's all the code you need to run VGG-19:

#!/usr/bin/env python

import numpy as np
import os
import time

os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"

import keras
import keras.applications as kapp
from keras.datasets import cifar10

(x_train, y_train_cats), (x_test, y_test_cats) = cifar10.load_data()
batch_size = 8
x_train = x_train[:batch_size]
x_train = np.repeat(np.repeat(x_train, 7, axis=1), 7, axis=2)
model = kapp.VGG19()
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])

print("Running initial batch (compiling tile program)")
y = model.predict(x=x_train, batch_size=batch_size)

# Now start the clock and run 10 batches
print("Timing inference...")
start = time.time()
for i in range(10):
    y = model.predict(x=x_train, batch_size=batch_size)
print("Ran in {} seconds".format(time.time() - start))

Reporting Issues

Either open a ticket on GitHub or join our slack channel (#plaidml).

CI & Validation

Validated Hardware

A comprehensive set of tests for each release are run against the hardware targets listed below.

  • AMD

    • R9 Nano
    • RX 480
    • Vega 10
  • Intel

    • HD4000
    • HD Graphics 505
  • NVIDIA

    • K80
    • GT 640M
    • GTX 1050
    • GTX 1070

Validated Networks

We support all of the Keras application networks from current versions of 2.x. Validated networks are tested for performance and correctness as part of our continuous integration system.

  • CNNs

    • Inception v3
    • ResNet50
    • VGG19
    • Xception
    • MobileNet
    • DenseNet
    • ShuffleNet
  • LSTM

    • examples/imdb_lstm.py (from keras)
Comments
  • [macOS] model.fit() loss: nan

    [macOS] model.fit() loss: nan

    Ran mnist_cnn.py from keras/examples after adding plaidml as the backend. This issue affects many others, but this is the simplest example.

    Will run fine for a while, then loss will hit nan and acc will plummet until it hits 0, where it stays.

    Andys-iMac-2:examples andy$ python mnist_cnn.py x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples INFO:plaidml:Opening device "amd_radeon_pro_580_compute_engine.0 Train on 60000 samples, validate on 10000 samples Epoch 1/12 59776/60000 [============================>.] - ETA: 0s - loss: 0.3177 - acc: 0.9025INFO:plaidml:Analyzing Ops: 85 of 285 operations complete 60000/60000 [==============================] - 27s - loss: 0.3172 - acc: 0.9026 - val_loss: 0.2699 - val_acc: 0.9217 Epoch 2/12 60000/60000 [==============================] - 18s - loss: 0.1104 - acc: 0.9666 - val_loss: 0.2247 - val_acc: 0.9308 Epoch 3/12 60000/60000 [==============================] - 19s - loss: nan - acc: 0.5408 - val_loss: nan - val_acc: 0.0000e+00 Epoch 4/12 60000/60000 [==============================] - 19s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 5/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 6/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 7/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 8/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 9/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 10/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 11/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Epoch 12/12 60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00 Test loss: nan Test accuracy: 0.0

    opened by andyoneal 28
  • trying to implement ReflectionPadding2D

    trying to implement ReflectionPadding2D

    finally I implemented it in one op for B,H,W,C

    class ReflectionPadding2D(PMLTile.Operation):
        def __init__(self, input, h_pad, w_pad):
            if K.image_data_format() == 'channels_last':
                if input.shape.ndims == 4:
                    H, W = input.shape.dims[1:3]
                    if (type(H) == int and h_pad >= H) or \
                       (type(W) == int and w_pad >= W):
                        raise ValueError("Paddings must be less than dimensions.")
                    c = """ function (I[B, H, W, C] ) -> (O) {{
                            WE = W + {w_pad}*2;
                            HE = H + {h_pad}*2;
                        """.format(h_pad=h_pad, w_pad=w_pad)
                    if w_pad > 0:
                        c += """
                            LEFT_PAD [b, h, w , c : B, H, WE, C ] = =(I[b, h, {w_pad}-w,            c]), w < {w_pad} ;
                            HCENTER  [b, h, w , c : B, H, WE, C ] = =(I[b, h, w-{w_pad},            c]), w < W+{w_pad}-1 ;
                            RIGHT_PAD[b, h, w , c : B, H, WE, C ] = =(I[b, h, 2*W - (w-{w_pad}) -2, c]);
                            LCR = LEFT_PAD+HCENTER+RIGHT_PAD;
                        """.format(h_pad=h_pad, w_pad=w_pad)
                    else:
                        c += "LCR = I;"
                    if h_pad > 0:
                        c += """
                            TOP_PAD   [b, h, w , c : B, HE, WE, C ] = =(LCR[b, {h_pad}-h,            w, c]), h < {h_pad};
                            VCENTER   [b, h, w , c : B, HE, WE, C ] = =(LCR[b, h-{h_pad},            w, c]), h < H+{h_pad}-1 ;
                            BOTTOM_PAD[b, h, w , c : B, HE, WE, C ] = =(LCR[b, 2*H - (h-{h_pad}) -2, w, c]);
                            TVB = TOP_PAD+VCENTER+BOTTOM_PAD;
                        """.format(h_pad=h_pad, w_pad=w_pad)
                    else:
                        c += "TVB = LCR;"
                    c += "O = TVB; }"
                    inp_dims = input.shape.dims
                    out_dims = (inp_dims[0], inp_dims[1]+h_pad*2, inp_dims[2]+w_pad*2, inp_dims[3])
                else:
                    raise NotImplemented
            else:
                raise NotImplemented
            super(ReflectionPadding2D, self).__init__(c, [('I', input) ],
                    [('O', PMLTile.Shape(input.shape.dtype, out_dims ) )])
    

    also I implemented it via slice and concat but I suppose it will consume more VRAM for this? or am I wrong??

    class ReflectionPadding2D():
        def __init__(self, h_pad, w_pad):
            self.h_pad, self.w_pad = h_pad, w_pad
        def __call__(self, inp):
            h_pad, w_pad = self.h_pad, self.w_pad
            if K.image_data_format() == 'channels_last':
                if inp.shape.ndims == 4:
                    w = K.concatenate ([ inp[:,:,w_pad:0:-1,:],
                                         inp,
                                         inp[:,:,-2:-w_pad-2:-1,:] ], axis=2 )
                    h = K.concatenate ([ w[:,h_pad:0:-1,:,:],
                                         w,
                                         w[:,-2:-h_pad-2:-1,:,:] ], axis=1 )
                    return h
                else:
                    raise NotImplemented
            else:
                raise NotImplemented
    
    needs integration 
    opened by iperov 27
  • plaidml.exceptions.PlaidMLError: Could not find PlaidML configuration file:

    plaidml.exceptions.PlaidMLError: Could not find PlaidML configuration file: "experimental.json".

    Traceback (most recent call last): File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1264.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 193, in run_module_as_main "main", mod_spec) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1264.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\Scripts\plaidml-setup.exe_main.py", line 5, in File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\plaidml_init.py", line 50, in import plaidml.settings File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\plaidml\settings.py", line 33, in _setup_config('PLAIDML_EXPERIMENTAL_CONFIG', 'experimental.json') File "C:\Users\andre\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\plaidml\settings.py", line 30, in _setup_config 'Could not find PlaidML configuration file: "{}".'.format(filename)) plaidml.exceptions.PlaidMLError: Could not find PlaidML configuration file: "experimental.json".

    opened by Duddino 26
  • Memory error on Vega 10

    Memory error on Vega 10

    Hi I am trying plaid ml on AMD Vega 10 : gfx900

    I get the following error:

    [email protected]:~/biswa/plaidbench$ python plaidbench.py mobilenet Using PlaidML backend. INFO:plaidml:Initializing device gfx900.0: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Initializing device gfx900.1: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Initializing device gfx900.2: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Initializing device gfx900.3: "gfx900", vendor "Advanced Micro Devi ces, Inc." INFO:plaidml:Opening device "gfx900.3": "Advanced Micro Devices, Inc. gfx900"

    Model loaded. Compiling and running initial batch, batch_size=1 Warmup Memory access fault by GPU node-7 on address 0x4408bd6000. Reason: Page not pres ent or supervisor privilege. Aborted (core dumped)

    Any idea how to resolve this?

    Thanks, Biswa

    opened by biswagsingh 26
  • "CL_OUT_OF_HOST_MEMORY" error when command "plaidml-setup"

    Hello again, I'm experiencing a new issue with the 0.6.0 rc1 version of the plaidml. Using 0.5 led to this issue: https://github.com/plaidml/plaidml/issues/73. Any luck of solving it?

    opened by iamkucuk 23
  • Feature request - port to Python 3.6

    Feature request - port to Python 3.6

    I've got PlaidML running on my AMD Bonaire on Arch Linux with Python 2.7 in a Conda environment. Every other Python package I have runs with 3.6 and my goal is to keep it that way. ;-)

    There doesn't seem to even be a pip package for 3.6, so the pip install -U plaidml-keras fails with Python 3.6. If you can post build-from-GitHub-source instructions, I can make a local package and install it.

    P.S.: Let me know if you want Arch setup instructions for AMD GPUs. Most of it is on the Arch User Repository wiki but I've got some scripts that do the work.

    P.P.S.: Benchmark results

    Using PlaidML backend.
    INFO:plaidml:Initializing device bonaire.0: "Bonaire", vendor "Advanced Micro Devices, Inc."
    INFO:plaidml:Opening device "bonaire.0": "Advanced Micro Devices, Inc. Bonaire"
    Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.6/mobilenet_1_0_224_tf.h5
    16793600/17225924 [============================>.] - ETA: 0s 
    Model loaded.
    Compiling and running initial batch, batch_size=1
    Warmup
    Doing the main timing
    Example finished, elapsed: 6.821215868 (compile), 15.0223557949 (execution)
    
    opened by znmeb 21
  • Mac+AMD: AMD not detected and Intel uses too high of a work group

    Mac+AMD: AMD not detected and Intel uses too high of a work group

    iMac 2017 with a Radeon Pro 580 and a Core i5-7600K. Compiled and installed PlaidML from source. Installed via the pip wheel.

    Ran plaidml-setup:

    PlaidML Setup (0.0.0.dev0)

    Thanks for using PlaidML!

    Some Notes:

    • Bugs and other issues: https://github.com/plaidml/plaidml
    • Questions: https://stackoverflow.com/questions/tagged/plaidml
    • Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
    • PlaidML is licensed under the GNU AGPLv3

    Default Config Devices: No devices.

    Experimental Config Devices: intel(r)_core(tm)i5-7600k_cpu@_3.80ghz.0 : Intel Intel(R) Core(TM) i5-7600K CPU @ 3.80GHz

    Using experimental devices can cause poor performance, crashes, and other nastiness. Enable experimental device support? (y,n)[n]:y

    PlaidML sends anonymous usage statistics to help guide improvements. We'd love your help making it better.

    Enable telemetry reporting? (y,n)[y]:y

    Almost done. Multiplying some matrices... Tile code: function (B[X,Z], C[Z,Y]) -> (A) { A[x,y : X,Y] = +(B[x,z] * C[z,y]); } ERROR:plaidml:OpenCL: [CL_INVALID_WORK_GROUP_SIZE] : OpenCL Error : clEnqueueNDRangeKernel failed: total work group size (32) is greater than the device can support (1) (cb=12) Whew. That worked.

    Save settings to /Users/andy/.plaidml? (y,n)[y]:y Success!

    Should a gpu be detected at this point? Is there somewhere I can lower total work group size manually?

    New to submitting git issues. Sorry if I'm missing anything.

    opened by andyoneal 19
  • PlaidML Setup Issue Windows

    PlaidML Setup Issue Windows

    Hi, Running plaidml-setup gives me the following:

    PlaidML Setup (0.3.5)

    Thanks for using PlaidML!

    Some Notes:

    • Bugs and other issues: https://github.com/plaidml/plaidml
    • Questions: https://stackoverflow.com/questions/tagged/plaidml
    • Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
    • PlaidML is licensed under the GNU AGPLv3

    No OpenCL devices found. Check driver installation. Read the helpful, easy driver installation instructions from our README: http://github.com/plaidml/plaidml

    This is the output from clinfo: Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.1 AMD-APP (2766.5) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices

    Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Board name: Radeon RX 580 Series Device Topology: PCI[ B#1, D#0, F#0 ] Max compute units: 36 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 256 Preferred vector width char: 4 Preferred vector width short: 2 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 4 Native vector width short: 2 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 1 Max clock frequency: 1340Mhz Address bits: 64 Max memory allocation: 4244635648 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 64 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 2048 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 8589934592 Constant buffer size: 4244635648 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Max pipe arguments: 16 Max pipe active reservations: 16 Max pipe packet size: 4244635648 Max global variable size: 3820172032 Max global variable preferred total size: 8589934592 Max read/write image args: 64 Max on device events: 1024 Queue on device max size: 8388608 Max on device queues: 1 Queue on device preferred size: 262144 SVM capabilities: Coarse grain buffer: Yes Fine grain buffer: Yes Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue on Host properties: Out-of-Order: No Profiling : Yes Queue on Device properties: Out-of-Order: Yes Profiling : Yes Platform ID: 00007FFEC2C66FD0 Name: Ellesmere Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 2.0 Driver version: 2766.5 Profile: FULL_PROFILE Version: OpenCL 2.0 AMD-APP (2766.5) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash cl_amd_planar_yuv

    Shouldn't it be working? I just switched to a new computer, so I used to use NVIDIA with CUDA. Any help is appreciated!

    Note: I do have the most recent AMD driver installed.

    opened by YutaTakano 16
  • could not broadcast input array from shape (3,2048) into shape (6144)

    could not broadcast input array from shape (3,2048) into shape (6144)

    I just installed plaidml and i tried to run this example:

    #!/usr/bin/env python
    
    import plaidml.keras
    plaidml.keras.install_backend() 
    
    import numpy as np
    import matplotlib.pyplot as plt
    from keras.models import Sequential
    from keras.layers.core import Dense, Activation, Dropout
    from keras.datasets import mnist
    from keras.utils import np_utils
    
    # fix a random seed for reproducibility
    np.random.seed(9)
    
    # user inputs
    nb_epoch = 25
    num_classes = 10
    batch_size = 128
    train_size = 60000
    test_size = 10000
    v_length = 784
    
    # split the mnist data into train and test
    (trainData, trainLabels), (testData, testLabels) = mnist.load_data()
    
    
    # reshape the dataset
    trainData = trainData.reshape(train_size, v_length)
    testData = testData.reshape(test_size, v_length)
    trainData = trainData.astype("float32")
    testData = testData.astype("float32")
    trainData /= 255
    testData /= 255
    
    
    # convert class vectors to binary class matrices --> one-hot encoding
    mTrainLabels = np_utils.to_categorical(trainLabels, num_classes)
    mTestLabels = np_utils.to_categorical(testLabels, num_classes)
    
    # create the model
    model = Sequential()
    model.add(Dense(512, input_shape=(784,)))
    model.add(Activation("relu"))
    model.add(Dropout(0.2))
    model.add(Dense(256))
    model.add(Activation("relu"))
    model.add(Dropout(0.2))
    model.add(Dense(num_classes))
    model.add(Activation("softmax"))
    
    # summarize the model
    model.summary()
    
    # compile the model
    model.compile(loss="categorical_crossentropy",
    			  optimizer="adam",
    			  metrics=["accuracy"])
    
    # fit the model
    history = model.fit(trainData, 
    				 	mTrainLabels,
    					validation_data=(testData, mTestLabels),
    					batch_size=batch_size,
    					nb_epoch=nb_epoch,
    					verbose=2)
    
    # print the history keys
    
    
    # evaluate the model
    scores = model.evaluate(testData, mTestLabels, verbose=0)
    
    # history plot for accuracy
    plt.plot(history.history["acc"])
    plt.plot(history.history["val_acc"])
    plt.title("Model Accuracy")
    plt.xlabel("Epoch")
    plt.ylabel("Accuracy")
    plt.legend(["train", "test"], loc="upper left")
    plt.show()
    
    # history plot for accuracy
    plt.plot(history.history["loss"])
    plt.plot(history.history["val_loss"])
    plt.title("Model Loss")
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.legend(["train", "test"], loc="upper left")
    plt.show()
    
    

    and I got this error

    could not broadcast input array from shape (3,2048) into shape (6144)

    Then I tried running Hello VGG example from plaidml github page and I got the same error.

    I am using plaidml 0.3.4 on ubuntu in virtualenv and I am trying to run this code on rx 480.

    Tnx for help.

    opened by leon3428 16
  • plaidml.exceptions.Unknown: Duplicate updates

    plaidml.exceptions.Unknown: Duplicate updates

    Setup:

    sudo apt-get install clinfo
    clinfo [sees 1080ti]
    sudo pip install -U plaidml-keras
    plaidml-setup
    [insert before keras import:]
    import plaidml.keras
    plaidml.keras.install_backend()
    

    But, intermediate problem:

     ImportError: No module named plaidml.keras
    $ which python
    /home/phobrain/anaconda2/bin//python
    

    Fix:

    sys.path.append('/usr/local/lib/python2.7/dist-packages/')
    import plaidml.keras
    plaidml.keras.install_backend()
    

    'Real' issue being reported:

    File "siaconv.py", line 919, in doit epochs=epochs) File "/home/phobrain/anaconda2/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper return func(*args, **kwargs) File "/home/phobrain/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1926, in fit_generator self._make_train_function() File "/home/phobrain/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 967, in _make_train_function **self._function_kwargs) File "/usr/local/lib/python2.7/dist-packages/plaidml/keras/backend.py", line 1718, in function return _Function(inputs, outputs, updates, name) File "/usr/local/lib/python2.7/dist-packages/plaidml/keras/backend.py", line 931, in init c.add_update(_plaidml_val(var), _plaidml_val(newval)) File "/usr/local/lib/python2.7/dist-packages/plaidml/init.py", line 1289, in add_update _lib().plaidml_add_composer_update(self, dest, src) File "/usr/local/lib/python2.7/dist-packages/plaidml/init.py", line 674, in _check_err self.raise_last_status() File "/usr/local/lib/python2.7/dist-packages/plaidml/library.py", line 136, in raise_last_status raise self.last_status() plaidml.exceptions.Unknown: Duplicate updates

    model.fit_generator(
            myGen('data', tr_pairs, tr_y, batch_size, True),
            (len(tr_pairs)-1) / batch_size,
            validation_data=myGen('valid', te_pairs, te_y, batch_size, True),
            validation_steps=1,
            max_queue_size=2,
            workers=1,
            epochs=epochs)
    

    Net:

    KERNEL_INIT = 'glorot_normal'
    
        seq.add(Dense(dense_size, input_shape=input_shape,
                    activation='relu', kernel_initializer=KERNEL_INIT))
        seq.add(BatchNormalization())
        seq.add(Dense((dense_size*2)/3,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
        seq.add(Dropout(0.1, seed=SEED))
        seq.add(Dense(dense_size/4,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
        seq.add(Dense((dense_size*2)/3,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
        seq.add(Dense(dense_size,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
        seq.add(Dense(512,
                    activation='relu',
                    kernel_initializer=KERNEL_INIT))
        seq.add(Dense(256,
                    activation='relu',
                    kernel_initializer=KERNEL_INIT))
        seq.add(Dense(128,
                    activation='relu',
                    kernel_initializer=KERNEL_INIT))
        seq.add(Dense(256,
                    activation='relu',
                    kernel_initializer=KERNEL_INIT))
        seq.add(Dense(128,
                    activation='relu',
                    kernel_initializer=KERNEL_INIT))
    
    opened by phobrain 16
  • Plaidml not detecting Mali-T628 on ARM

    Plaidml not detecting Mali-T628 on ARM

    Hi,

    I've build plaidml 0.3.5 to use on Odroid XU4 with Mali-T628 GPU with debian stretch. I manage to install the wheel, when I run plaidml-setup, I get:

    "No supported devices found. Run 'clinfo' and file an issue containing the full output."

    However, with plaidml 0.3.0rc1 latest available with pip install plaidml, my devices can be configured and I have 2 mali-t628 reported. "experimental.json" appears quite similar in both cases.

    Any clue with what I may have done wrong building plaidml ? (used basel 0.18.1 with --config linux_arm_32v7) or what change might explain 0.3.5 not recognizing my devices where 0.3.0rc1 did ?

    Thanks

    Here's my clinfo report:

    Number of platforms 1 Platform Name ARM Platform Platform Vendor ARM Platform Version OpenCL 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory Platform Extensions function suffix ARM

    Platform Name ARM Platform Number of devices 2 Device Name Mali-T628 Device Vendor ARM Device Vendor ID 0x6200010 Device Version OpenCL 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Driver Version 1.2 Device OpenCL C Version OpenCL C 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Device Type GPU Device Profile FULL_PROFILE Max compute units 4 Max clock frequency 600MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 4 Preferred / native vector sizes
    char 16 / 16
    short 8 / 8
    int 4 / 4
    long 2 / 2
    half 8 / 8 (cl_khr_fp16) float 4 / 4
    double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 2090405888 (1.947GiB) Error Correction support No Max memory allocation 522601472 (498.4MiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size <printDeviceInfo:89: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30> Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Max 2D image size 65536x65536 pixels Max 3D image size 65536x65536x65536 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Global Local memory size 32768 (32KiB) Max constant buffer size 65536 (64KiB) Max number of constant args 8 Max size of kernel argument 1024 Queue properties
    Out-of-order execution Yes Profiling Yes Prefer user sync for interop No Profiling timer resolution 1000ns Execution capabilities
    Run OpenCL kernels Yes Run native kernels No printf() buffer size 1048576 (1024KiB) Built-in kernels
    Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory

    Device Name Mali-T628 Device Vendor ARM Device Vendor ID 0x6200010 Device Version OpenCL 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Driver Version 1.2 Device OpenCL C Version OpenCL C 1.2 v1.r12p0-04rel0.03af15950392f3702b248717f4938b82 Device Type GPU Device Profile FULL_PROFILE Max compute units 2 Max clock frequency 600MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 4 Preferred / native vector sizes
    char 16 / 16
    short 8 / 8
    int 4 / 4
    long 2 / 2
    half 8 / 8 (cl_khr_fp16) float 4 / 4
    double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 2090405888 (1.947GiB) Error Correction support No Max memory allocation 522601472 (498.4MiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size <printDeviceInfo:89: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30> Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Max 2D image size 65536x65536 pixels Max 3D image size 65536x65536x65536 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Global Local memory size 32768 (32KiB) Max constant buffer size 65536 (64KiB) Max number of constant args 8 Max size of kernel argument 1024 Queue properties
    Out-of-order execution Yes Profiling Yes Prefer user sync for interop No Profiling timer resolution 1000ns Execution capabilities
    Run OpenCL kernels Yes Run native kernels No printf() buffer size 1048576 (1024KiB) Built-in kernels
    Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_gl_sharing cl_khr_icd cl_khr_egl_event cl_khr_egl_image cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory

    NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) ARM Platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [ARM] clCreateContext(NULL, ...) [default] Success [ARM] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (2) Platform Name ARM Platform Device Name Mali-T628 Device Name Mali-T628 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (2) Platform Name ARM Platform Device Name Mali-T628 Device Name Mali-T628

    ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.11 ICD loader Profile OpenCL 2.1

    opened by nitescuc 15
  • Tile language vs EDSL

    Tile language vs EDSL

    In the last publicly available version (0.7.0) of PlaidML the Tile language was used to write operators. The latest documentation talks about a C++/Python EDSL being developed as well, however, documentation there is a bit lacking, and only C++ is presented, not any Python version. I was wondering whether that is meant to substitute the Tile language or will both of them be kept in the future? I used to have some test project built using the Tile language from Python, which used to work well for me, and I'd like to improve that, but was wondering if I should port it to the EDSL approach, or is it okay to stay with the Tile language approach.

    Also, I was wondering if the v1 branch still supports the Tile language, and will it continue to do so when it is finally released? If I take the path of compiling it from source, and use that instead of the pip installable v0.7, will my project that uses it break or continue working?

    Is there any idea of a time-line for the next public release? I have been following this promising project for a while, hoping that a new release will come sooner or later.

    opened by gyenesvi 9
  • Capture affine.store op

    Capture affine.store op

    Hello.

    I try to perform stencil on the following code

    affine.store %1, %arg2[%arg3, %arg4, %arg5, %arg6] : memref<?x?x?x?xi64>
    affine.yield %arg2 : memref<?x?x?x?xi64>
    

    by using the matchPattern function:

    matchPattern(yield, m_Op<AffineYieldOp>(m_Capture(&store, m_Op<AffineStoreOp>(m_Any(), m_Any())))
    

    But it seems m_Capture function and m_Op function used in existing examples, such as StencilGEMM, can not be used to capture operation without a return val, like affine.store here. Can I just use existing structure to match this pattern and capture the affine.store op ?

    opened by IsolatedMy 4
  • Stenciling of MAX/ADD for RN50

    Stenciling of MAX/ADD for RN50

    This patch fix the pass "--x86-stencil-tpp-unary" so that all the reduce patterns in RN50 get stenciled with correct TPP parameters and unary flags.

    opened by ZhangMZh 0
  • Batch parallelization and allocs to alloca changes

    Batch parallelization and allocs to alloca changes

    This wip patch modifies scoped allocs to allocas using PromoteBuffersToStackPass as well as pxa localization pass. As of now, the first pass does not seem to be scoping allocs other than weights. On the other hand, pxa localization pass throws a segfault at runtime for threads>1. This patch also parallelizes layers along batch dimension barring those which don't have batch dimension as the outer loop's induction variable.

    wip 
    opened by KavithaTipturMadhu 0
  • TPSS: parallelization directives

    TPSS: parallelization directives

    This patch adds support for parallelization directives to be specified in a file using the environment <PLAIDML_PARALLELIZATION_CONFIG_FILE>. This patch adds a rule parser which matches the shapes of convolution based on the equalities/inequalities in the config file and applies the rules that follow.
    It is important to note that collapse directive also adds a parallelize directive by default and can only be applied to 2 loop levels corresponding to a perfect loop nest (validity of reordering of loops in order to support the requested order is not verified).

    wip 
    opened by KavithaTipturMadhu 0
  • how to fix

    how to fix "cannot import name 'Iterable' from 'collections' when running test code from main page

    Hey all,

    I've decided to try some mL projects and since I have a amd gpu (5700xt) I decided to use Plaidml. On the main website theres a test code for VGG-19 and I'm trying to run it right now but I run into the error attached in the screenshot. I tried to simply change collections to collections.abc but it looks like python 3.10 already does that? I'm pretty stuck, any help would be appreciated. Thanks! Screenshot from 2022-06-11 22-03-38

    opened by KSTRTK 3
Releases(0.7.0)
Owner
PlaidML
PlaidML makes deep learning work everywhere.
PlaidML
Code for generating the figures in the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?"

Code for running simulations for the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Lin

Matthew Farrell 1 Nov 22, 2022
A treasure chest for visual recognition powered by PaddlePaddle

简体中文 | English PaddleClas 简介 飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集,助力使用者训练出更好的视觉模型和应用落地。 近期更新 2021.11.1 发布PP-ShiTu技术报告,新增饮料识别demo 2021.10.23 发

4.6k Dec 31, 2022
A PyTorch-centric hybrid classical-quantum machine learning framework

torchquantum A PyTorch-centric hybrid classical-quantum dynamic neural networks framework. News Add a simple example script using quantum gates to do

MIT HAN Lab 400 Jan 02, 2023
Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

EfficientZero (NeurIPS 2021) Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021. Environments Effi

Weirui Ye 671 Jan 03, 2023
A 10000+ hours dataset for Chinese speech recognition

WenetSpeech Official website | Paper A 10000+ Hours Multi-domain Chinese Corpus for Speech Recognition Download Please visit the official website, rea

310 Jan 03, 2023
Cascading Feature Extraction for Fast Point Cloud Registration (BMVC 2021)

Cascading Feature Extraction for Fast Point Cloud Registration This repository contains the source code for the paper [Arxive link comming soon]. Meth

7 May 26, 2022
Image augmentation library in Python for machine learning.

Augmentor is an image augmentation library in Python for machine learning. It aims to be a standalone library that is platform and framework independe

Marcus D. Bloice 4.8k Jan 07, 2023
Self-supervised learning on Graph Representation Learning (node-level task)

graph_SSL Self-supervised learning on Graph Representation Learning (node-level task) How to run the code To run GRACE, sh run_GRACE.sh To run GCA, sh

Namkyeong Lee 3 Dec 31, 2021
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •

Pytorch Lightning 21.1k Jan 08, 2023
GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles

GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles This repository contains a method to generate 3D conformer ensembles direct

127 Dec 20, 2022
A tool to prepare websites grabbed with wget for local viewing.

makelocal A tool to prepare websites grabbed with wget for local viewing. exapmples After fetching xkcd.com with: wget -r -no-remove-listing -r -N --p

5 Apr 23, 2022
Laplacian Score-regularized Concrete Autoencoders

Laplacian Score-regularized Concrete Autoencoders Requirements: torch = 1.9 scikit-learn = 0.24 omegaconf = 2.0.6 scipy = 1.6.0 matplotlib How to

JS 6 Dec 07, 2022
AlphaNet Improved Training of Supernet with Alpha-Divergence

AlphaNet: Improved Training of Supernet with Alpha-Divergence This repository contains our PyTorch training code, evaluation code and pretrained model

Facebook Research 87 Oct 10, 2022
NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring

NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring Uncensored version of the following image can be found at https://i.

notAI.tech 1.1k Dec 29, 2022
Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

Varun Nair 37 Dec 30, 2022
PyTorch Implementation of DSB for Score Based Generative Modeling. Experiments managed using Hydra.

Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling This repository contains the implementation for the paper Diffusion

James Thornton 50 Jan 03, 2023
[CVPR 2021] Few-shot 3D Point Cloud Semantic Segmentation

Few-shot 3D Point Cloud Semantic Segmentation Created by Na Zhao from National University of Singapore Introduction This repository contains the PyTor

117 Dec 27, 2022
This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

BUPT GAMMA Lab 519 Jan 02, 2023
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models This repository is the official implementation of the fol

DistributedML 41 Dec 06, 2022
Face and other object detection using OpenCV and ML Yolo

Object-and-Face-Detection-Using-Yolo- Opencv and YOLO object and face detection is implemented. You only look once (YOLO) is a state-of-the-art, real-

Happy N. Monday 3 Feb 15, 2022