BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Last update: Jan 06, 2022

Related tags

Overview

Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems.

Introduction

BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters.

Installation

Please download BigDL Packages or pip install BigDL (conda)

How to run Program on Spark

Usage: spark-submit-with-bigdl.sh + [options] + file.py

Options:

master MASTER URL: spark, yarn, k8s, local.
local[k]: Run Spark locally with k worker threads as logical cores on your machine.
File.py: File for executing program.

System configuration

Program run on system includes:

System/Host Processor: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
CPU(s): 48
Core(s) per socket: 12
Socket(s): 2
Memory: 183 G (free)

Data Description and Run Model

It is a dataset of 60,000 small square 28×28 pixel grayscale images of handwritten single digits between 0 and 9. The MNIST data is split into three parts: 60,000 data points of training data, 10,000 points of test data.

With this BigDL Problem, We use LSTM model for MNIST digit classification problem.

BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Related tags

Overview

Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems.

Introduction

Installation

How to run Program on Spark

System configuration

Data Description and Run Model

BigDL Performance Evaluation

Execution running time

Computation Evaluation (SPEED UP)

Owner

Vo Cong Thanh

Automatic earthquake catalog building workflow: EQTransformer + Siamese EQTransformer + PickNet + REAL + HypoInverse

BIGDATA SIMULATION ONE PIECE WORLD CENSUS

Scraping and analysis of leetcode-compensations page.

Create HTML profiling reports from pandas DataFrame objects

Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production

A distributed block-based data storage and compute engine

Streamz helps you build pipelines to manage continuous streams of data

NFCDS Workshop Beginners Guide Bioinformatics Data Analysis

pandas: powerful Python data analysis toolkit

Show you how to integrate Zeppelin with Airflow

DataPrep — The easiest way to prepare data in Python

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.

Fancy data functions that will make your life as a data scientist easier.

An easy-to-use feature store

The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis

In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

Udacity-api-reporting-pipeline - Udacity api reporting pipeline

Multiple Pairwise Comparisons (Post Hoc) Tests in Python

Display the behaviour of a realtime program with a scope or logic analyser.