[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark

Overview

build

TensorFrames (Deprecated)

Note: TensorFrames is deprecated. You can use pandas UDF instead.

Experimental TensorFlow binding for Scala and Apache Spark.

TensorFrames (TensorFlow on Spark DataFrames) lets you manipulate Apache Spark's DataFrames with TensorFlow programs.

This package is experimental and is provided as a technical preview only. While the interfaces are all implemented and working, there are still some areas of low performance.

Supported platforms:

This package only officially supports linux 64bit platforms as a target. Contributions are welcome for other platforms.

See the file project/Dependencies.scala for adding your own platform.

Officially TensorFrames supports Spark 2.4+ and Scala 2.11.

See the user guide for extensive information about the API.

For questions, see the TensorFrames mailing list.

TensorFrames is available as a Spark package.

Requirements

  • A working version of Apache Spark (2.4 or greater)

  • Java 8+

  • (Optional) python 2.7+/3.6+ if you want to use the python interface.

  • (Optional) the python TensorFlow package if you want to use the python interface. See the official instructions on how to get the latest release of TensorFlow.

  • (Optional) pandas >= 0.19.1 if you want to use the python interface

Additionally, for developement, you need the following dependencies:

  • protoc 3.x

  • nose >= 1.3

How to run in python

Assuming that SPARK_HOME is set, you can use PySpark like any other Spark package.

$SPARK_HOME/bin/pyspark --packages databricks:tensorframes:0.6.0-s_2.11

Here is a small program that uses TensorFlow to add 3 to an existing column.

import tensorflow as tf
import tensorframes as tfs
from pyspark.sql import Row

data = [Row(x=float(x)) for x in range(10)]
df = sqlContext.createDataFrame(data)
with tf.Graph().as_default() as g:
    # The TensorFlow placeholder that corresponds to column 'x'.
    # The shape of the placeholder is automatically inferred from the DataFrame.
    x = tfs.block(df, "x")
    # The output that adds 3 to x
    z = tf.add(x, 3, name='z')
    # The resulting dataframe
    df2 = tfs.map_blocks(z, df)

# The transform is lazy as for most DataFrame operations. This will trigger it:
df2.collect()

# Notice that z is an extra column next to x

# [Row(z=3.0, x=0.0),
#  Row(z=4.0, x=1.0),
#  Row(z=5.0, x=2.0),
#  Row(z=6.0, x=3.0),
#  Row(z=7.0, x=4.0),
#  Row(z=8.0, x=5.0),
#  Row(z=9.0, x=6.0),
#  Row(z=10.0, x=7.0),
#  Row(z=11.0, x=8.0),
#  Row(z=12.0, x=9.0)]

The second example shows the block-wise reducing operations: we compute the sum of a field containing vectors of integers, working with blocks of rows for more efficient processing.

# Build a DataFrame of vectors
data = [Row(y=[float(y), float(-y)]) for y in range(10)]
df = sqlContext.createDataFrame(data)
# Because the dataframe contains vectors, we need to analyze it first to find the
# dimensions of the vectors.
df2 = tfs.analyze(df)

# The information gathered by TF can be printed to check the content:
tfs.print_schema(df2)
# root
#  |-- y: array (nullable = false) double[?,2]

# Let's use the analyzed dataframe to compute the sum and the elementwise minimum 
# of all the vectors:
# First, let's make a copy of the 'y' column. This will be very cheap in Spark 2.0+
df3 = df2.select(df2.y, df2.y.alias("z"))
with tf.Graph().as_default() as g:
    # The placeholders. Note the special name that end with '_input':
    y_input = tfs.block(df3, 'y', tf_name="y_input")
    z_input = tfs.block(df3, 'z', tf_name="z_input")
    y = tf.reduce_sum(y_input, [0], name='y')
    z = tf.reduce_min(z_input, [0], name='z')
    # The resulting dataframe
    (data_sum, data_min) = tfs.reduce_blocks([y, z], df3)

# The final results are numpy arrays:
print(data_sum)
# [45., -45.]
print(data_min)
# [0., -9.]

Notes

Note the scoping of the graphs above. This is important because TensorFrames finds which DataFrame column to feed to TensorFrames based on the placeholders of the graph. Also, it is good practice to keep small graphs when sending them to Spark.

For small tensors (scalars and vectors), TensorFrames usually infers the shapes of the tensors without requiring a preliminary analysis. If it cannot do it, an error message will indicate that you need to run the DataFrame through tfs.analyze() first.

Look at the python documentation of the TensorFrames package to see what methods are available.

How to run in Scala

The scala support is a bit more limited than python. In scala, operations can be loaded from an existing graph defined in the ProtocolBuffers format, or using a simple scala DSL. The Scala DSL only features a subset of TensorFlow transforms. It is very easy to extend though, so other transforms will be added without much effort in the future.

You simply use the published package:

$SPARK_HOME/bin/spark-shell --packages databricks:tensorframes:0.6.0-s_2.11

Here is the same program as before:

import org.tensorframes.{dsl => tf}
import org.tensorframes.dsl.Implicits._

val df = spark.createDataFrame(Seq(1.0->1.1, 2.0->2.2)).toDF("a", "b")

// As in Python, scoping is recommended to prevent name collisions.
val df2 = tf.withGraph {
    val a = df.block("a")
    // Unlike python, the scala syntax is more flexible:
    val out = a + 3.0 named "out"
    // The 'mapBlocks' method is added using implicits to dataframes.
    df.mapBlocks(out).select("a", "out")
}

// The transform is all lazy at this point, let's execute it with collect:
df2.collect()
// res0: Array[org.apache.spark.sql.Row] = Array([1.0,4.0], [2.0,5.0])   

How to compile and install for developers

It is recommended you use Conda Environment to guarantee that the build environment can be reproduced. Once you have installed Conda, you can set the environment from the root of project:

conda create -q -n tensorframes-environment python=$PYTHON_VERSION

This will create an environment for your project. We recommend using Python version 3.7 or 2.7.13. After the environemnt is created, you can activate it and install all dependencies as follows:

conda activate tensorframes-environment
pip install --user -r python/requirements.txt

You also need to compile the scala code. The recommended procedure is to use the assembly:

build/sbt tfs_testing/assembly
# Builds the spark package:
build/sbt distribution/spDist

Assuming that SPARK_HOME is set and that you are in the root directory of the project:

$SPARK_HOME/bin/spark-shell --jars $PWD/target/testing/scala-2.11/tensorframes-assembly-0.6.1-SNAPSHOT.jar

If you want to run the python version:

PYTHONPATH=$PWD/target/testing/scala-2.11/tensorframes-assembly-0.6.1-SNAPSHOT.jar \
$SPARK_HOME/bin/pyspark --jars $PWD/target/testing/scala-2.11/tensorframes-assembly-0.6.1-SNAPSHOT.jar

Acknowledgements

Before TensorFlow released its Java API, this project was built on the great javacpp project, that implements the low-level bindings between TensorFlow and the Java virtual machine.

Many thanks to Google for the release of TensorFlow.

Comments
  •  java.lang.ClassNotFoundException: org.tensorframes.impl.DebugRowOps

    java.lang.ClassNotFoundException: org.tensorframes.impl.DebugRowOps

    I build the jar by follow the readme, and then run it in pycharm https://www.dropbox.com/s/qmrs72l0p8p4bc2/Screen%20Shot%202016-07-06%20at%2011.40.26%20PM.png?dl=0 I add the self build jar as content root, I guess that's cause the error,

    line 11 is x = tfs.block(df, "x")

    code:

    import tensorflow as tf
    import tensorframes as tfs
    from pyspark.shell import sqlContext
    from pyspark.sql import Row
    
    data = [Row(x=float(x)) for x in range(10)]
    df = sqlContext.createDataFrame(data)
    
    with tf.Graph().as_default() as g:
        # The TensorFlow placeholder that corresponds to column 'x'.
        # The shape of the placeholder is automatically inferred from the DataFrame.
        x = tfs.block(df, "x")
        # The output that adds 3 to x
        z = tf.add(x, 3, name='z')
        # The resulting dataframe
        df2 = tfs.map_blocks(z, df)
    
    # The transform is lazy as for most DataFrame operations. This will trigger it:
    df2.collect()
    

    log

    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    16/07/06 23:28:43 INFO SparkContext: Running Spark version 1.6.1
    16/07/06 23:28:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    16/07/06 23:28:44 INFO SecurityManager: Changing view acls to: julian_qian
    16/07/06 23:28:44 INFO SecurityManager: Changing modify acls to: julian_qian
    16/07/06 23:28:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(julian_qian); users with modify permissions: Set(julian_qian)
    16/07/06 23:28:44 INFO Utils: Successfully started service 'sparkDriver' on port 60597.
    16/07/06 23:28:45 INFO Slf4jLogger: Slf4jLogger started
    16/07/06 23:28:45 INFO Remoting: Starting remoting
    16/07/06 23:28:45 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:60598]
    16/07/06 23:28:45 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 60598.
    16/07/06 23:28:45 INFO SparkEnv: Registering MapOutputTracker
    16/07/06 23:28:45 INFO SparkEnv: Registering BlockManagerMaster
    16/07/06 23:28:45 INFO DiskBlockManager: Created local directory at /private/var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/blockmgr-5174cef3-29d9-4d2a-a84e-279a0e3d2f83
    16/07/06 23:28:45 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
    16/07/06 23:28:45 INFO SparkEnv: Registering OutputCommitCoordinator
    16/07/06 23:28:45 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
    16/07/06 23:28:45 INFO Utils: Successfully started service 'SparkUI' on port 4041.
    16/07/06 23:28:45 INFO SparkUI: Started SparkUI at http://10.63.21.172:4041
    16/07/06 23:28:45 INFO Executor: Starting executor ID driver on host localhost
    16/07/06 23:28:45 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60599.
    16/07/06 23:28:45 INFO NettyBlockTransferService: Server created on 60599
    16/07/06 23:28:45 INFO BlockManagerMaster: Trying to register BlockManager
    16/07/06 23:28:45 INFO BlockManagerMasterEndpoint: Registering block manager localhost:60599 with 511.1 MB RAM, BlockManagerId(driver, localhost, 60599)
    16/07/06 23:28:45 INFO BlockManagerMaster: Registered BlockManager
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_\   version 1.6.1
          /_/
    
    Using Python version 2.7.10 (default, Dec  1 2015 20:00:13)
    SparkContext available as sc, HiveContext available as sqlContext.
    16/07/06 23:28:46 INFO HiveContext: Initializing execution hive, version 1.2.1
    16/07/06 23:28:46 INFO ClientWrapper: Inspected Hadoop version: 2.6.0
    16/07/06 23:28:46 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
    16/07/06 23:28:46 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    16/07/06 23:28:46 INFO ObjectStore: ObjectStore, initialize called
    16/07/06 23:28:46 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
    16/07/06 23:28:46 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
    16/07/06 23:28:46 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:47 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:48 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    16/07/06 23:28:48 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:48 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
    16/07/06 23:28:49 INFO ObjectStore: Initialized ObjectStore
    16/07/06 23:28:49 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
    16/07/06 23:28:49 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
    16/07/06 23:28:49 INFO HiveMetaStore: Added admin role in metastore
    16/07/06 23:28:49 INFO HiveMetaStore: Added public role in metastore
    16/07/06 23:28:49 INFO HiveMetaStore: No user is added in admin role, since config is empty
    16/07/06 23:28:49 INFO HiveMetaStore: 0: get_all_databases
    16/07/06 23:28:49 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_all_databases   
    16/07/06 23:28:49 INFO HiveMetaStore: 0: get_functions: db=default pat=*
    16/07/06 23:28:49 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_functions: db=default pat=* 
    16/07/06 23:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian
    16/07/06 23:28:49 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/julian_qian
    16/07/06 23:28:49 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62_resources
    16/07/06 23:28:49 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62
    16/07/06 23:28:49 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/julian_qian/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62
    16/07/06 23:28:49 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62/_tmp_space.db
    16/07/06 23:28:49 INFO HiveContext: default warehouse location is /user/hive/warehouse
    16/07/06 23:28:49 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
    16/07/06 23:28:49 INFO ClientWrapper: Inspected Hadoop version: 2.6.0
    16/07/06 23:28:49 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
    16/07/06 23:28:50 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    16/07/06 23:28:50 INFO ObjectStore: ObjectStore, initialize called
    16/07/06 23:28:50 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
    16/07/06 23:28:50 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
    16/07/06 23:28:50 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:50 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:51 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    16/07/06 23:28:51 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:51 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:52 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:52 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:52 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
    16/07/06 23:28:52 INFO ObjectStore: Initialized ObjectStore
    16/07/06 23:28:52 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
    16/07/06 23:28:52 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
    16/07/06 23:28:52 INFO HiveMetaStore: Added admin role in metastore
    16/07/06 23:28:52 INFO HiveMetaStore: Added public role in metastore
    16/07/06 23:28:52 INFO HiveMetaStore: No user is added in admin role, since config is empty
    16/07/06 23:28:52 INFO HiveMetaStore: 0: get_all_databases
    16/07/06 23:28:52 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_all_databases   
    16/07/06 23:28:53 INFO HiveMetaStore: 0: get_functions: db=default pat=*
    16/07/06 23:28:53 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_functions: db=default pat=* 
    16/07/06 23:28:53 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:53 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/77eb618d-61cc-470e-abb4-18d356833efb_resources
    16/07/06 23:28:53 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/77eb618d-61cc-470e-abb4-18d356833efb
    16/07/06 23:28:53 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/julian_qian/77eb618d-61cc-470e-abb4-18d356833efb
    16/07/06 23:28:53 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/77eb618d-61cc-470e-abb4-18d356833efb/_tmp_space.db
    

    error log:

    Traceback (most recent call last):
      File "/Users/julian_qian/PycharmProjects/tensorflow/tfs.py", line 11, in <module>
        x = tfs.block(df, "x")
      File "/Users/julian_qian/etc/work/python/tensorframes/tensorframes-assembly-0.2.3.jar/tensorframes/core.py", line 315, in block
      File "/Users/julian_qian/etc/work/python/tensorframes/tensorframes-assembly-0.2.3.jar/tensorframes/core.py", line 333, in _auto_placeholder
      File "/Users/julian_qian/etc/work/python/tensorframes/tensorframes-assembly-0.2.3.jar/tensorframes/core.py", line 30, in _java_api
      File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
      File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
      File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling o32.loadClass.
    : java.lang.ClassNotFoundException: org.tensorframes.impl.DebugRowOps
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)
    
    opened by jq 13
  • Updated to tensorflow 1.6 and spark 2.3.

    Updated to tensorflow 1.6 and spark 2.3.

    Current version is not compatible with graphs generated by tf1.6 and it's preventing us from releasing dl-pipelines with tf1.6 support.

    • updated protobuf files and regenerated their java sources.
    • few minor changes related to Tensor taking a type parameter in tf1.6.
    opened by tomasatdatabricks 8
  • tensorframes is not working with variables.

    tensorframes is not working with variables.

    data = [Row(x=float(x)) for x in range(5)]
    df = sqlContext.createDataFrame(data)
    with tf.Graph().as_default() as g:
        # The placeholder that corresponds to column 'x'
        x = tf.placeholder(tf.double, shape=[None], name="x")
        # The output that adds 3 to x
        b = tf.Variable(float(3), name='a', dtype=tf.double)
        z = tf.add(x, b, name='z')
        #with or without `sess.run(tf.global_variables_initializer())`  following will fail
        
        df2 = tfs.map_blocks(z, df)
    
    df2.show()
    
    opened by yupbank 7
  • Does not work with Python3

    Does not work with Python3

    I just started using this with Python3, these are my commands run and the output messages.

    $SPARK_HOME/bin/pyspark --packages databricks:tensorframes:0.2.3-s_2.10

    Python 3.4.3 (default, Mar 26 2015, 22:03:40) [GCC 4.9.2] on linux Type "help", "copyright", "credits" or "license" for more information. Ivy Default Cache set to: /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars :: loading settings :: url = jar:file:/opt/spark-1.5.2/assembly/target/scala-2.10/spark-assembly-1.5.2-hadoop2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml databricks#tensorframes added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found databricks#tensorframes;0.2.3-s_2.10 in spark-packages found org.apache.commons#commons-lang3;3.4 in central :: resolution report :: resolve 98ms :: artifacts dl 4ms :: modules in use: databricks#tensorframes;0.2.3-s_2.10 from spark-packages in [default] org.apache.commons#commons-lang3;3.4 from central in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 2 | 0 | 0 | 0 || 2 | 0 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts copied, 2 already retrieved (0kB/3ms) Welcome to ____ __ / / ___ / / \ / _ / _ `/ __/ '/ / / .__/,// //_\ version 1.5.2 //

    Using Python version 3.4.3 (default, Mar 26 2015 22:03:40) SparkContext available as sc, SQLContext available as sqlContext.

    import tensorflow as tf import tensorframes as tfs

    Traceback (most recent call last): File "", line 1, in File "/tmp/spark-349c9955-ccd8-4fcd-938a-7e719fc45653/userFiles-bb935142-224f-4238-a144-f1cece7a5aa2/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/init.py", line 36, in ImportError: No module named 'core'

    opened by ushnish 6
  • Scala example does not work

    Scala example does not work

    I'm having trouble running the provided Scala example in the spark shell.

    My local environment is:

    • Spark 2.1.0
    • Scala version 2.11.8
    • Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121

    I ran the spark-shell with: spark-shell --packages databricks:tensorframes:0.2.5-rc2-s_2.11

    I get the following stacktrace which shuts down my spark process:

    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007fff90451b52, pid=64869, tid=0x0000000000001c03
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.121-b13 mixed mode bsd-amd64 compressed oops)
    # Problematic frame:
    # C  [libsystem_c.dylib+0x1b52]  strlen+0x12
    #
    # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    #
    # An error report file with more information is saved as:
    # /Users/ndrizard/projects/temps/hs_err_pid64869.log
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.java.com/bugreport/crash.jsp
    # The crash happened outside the Java Virtual Machine in native code.
    # See problematic frame for where to report the bug.
    #
    

    Thanks for your help!

    opened by nicodri 5
  • Py4JError(

    Py4JError("Answer from Java side is empty") while testing

    I have been experimenting with TensorFrames from quite some days. I have spark-1.6.1 and openjdk7 installed on my ubuntu 14.04 64bit machine. I am using IPython notebook for testing.

    import tensorframes as tfs command is working perfectly fine, but when i do tfs.print_schema(df), where df is a dataframe. The below error pops recursively till max. depth is reached.

    ERROR:py4j.java_gateway:Error while sending or receiving. Traceback (most recent call last): File "/home/prakhar/utilities/spark-1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 746, in send_command raise Py4JError("Answer from Java side is empty") Py4JError: Answer from Java side is empty

    opened by prakhar21 4
  • [ML-7986] Update tensorflow to 1.14.0

    [ML-7986] Update tensorflow to 1.14.0

    • Update tensorflow version to 1.14.0 in environment.yml, project/Dependencies.scala, and python/requirements.txt
    • Auto update *.proto with the script. All of this type update comes from tensorflow.
    opened by lu-wang-dl 3
  • Support Spark 2.3.1, TF 1.10.0 and drop Spark 2.1/2.2 (and hence Scala 2.10, Java 7)

    Support Spark 2.3.1, TF 1.10.0 and drop Spark 2.1/2.2 (and hence Scala 2.10, Java 7)

    • Drop support for Spark 2.1 and 2.2 and hence scala 2.10 and java 7
    • Update TF to 1.10 release
    • Remove nix files, which are not used
    • Update README

    We will support Spark 2.4 once RC is released.

    opened by mengxr 3
  • Usage of tf.contrib.distributions.percentile fails

    Usage of tf.contrib.distributions.percentile fails

    Consider the following dummy example using tf.contrib.distributions.percentile:

    from pyspark.context import SparkContext
    from pyspark.conf import SparkConf
    import tensorflow as tf
    import tensorframes as tfs
    from pyspark import SQLContext
    from pyspark.sql import Row
    from pyspark.sql.functions import *
    
    conf = SparkConf().setAppName("repro")
    sc = SparkContext(conf=conf)
    sqlContext = SQLContext(sc)
    
    data = [Row(x=[1.111, 0.516, 12.759]), Row(x=[2.222, 1.516, 13.759]), Row(x=[3.333, 2.516, 14.759]), Row(x=[4.444, 3.516, 15.759])]
    df = tfs.analyze(sqlContext.createDataFrame(data))
    
    with tf.Graph().as_default() as g:
    	x = tfs.block(df, "x")
    	q = tf.constant(90, 'float64', name='Percentile')
    	qntl = tf.contrib.distributions.percentile(x, q, axis=1)
    	result = tfs.map_blocks(x, df)
    	
    

    This fails with

    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2752, in _as_graph_element_locked
        return op.outputs[out_n]
    IndexError: list index out of range
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<stdin>", line 5, in <module>
      File "/tmp/spark-7f93e8f4-b6dc-4356-98fc-4c6de58c1e71/userFiles-f204b475-06b5-44e9-9a11-dbee001767d4/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 312, in map_blocks
      File "/tmp/spark-7f93e8f4-b6dc-4356-98fc-4c6de58c1e71/userFiles-f204b475-06b5-44e9-9a11-dbee001767d4/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 152, in _map
      File "/tmp/spark-7f93e8f4-b6dc-4356-98fc-4c6de58c1e71/userFiles-f204b475-06b5-44e9-9a11-dbee001767d4/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 83, in _add_shapes
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2880, in get_tensor_by_name
        return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2708, in as_graph_element
        return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2757, in _as_graph_element_locked
        % (repr(name), repr(op_name), len(op.outputs)))
    KeyError: "The name 'percentile/assert_integer/statically_determined_was_integer:0' refers to a Tensor which does not exist. The operation, 'percentile/assert_integer/statically_determined_was_integer', exists but only has 0 outputs."
    
    opened by martinstuder 3
  • Readme Example throwing Py4J error

    Readme Example throwing Py4J error

    I am using Spark 2.0.2, Python 2.7.12, iPython 5.1.0 on macOS 10.12.1.

    I am launching pyspark like this

    $SPARK_HOME/bin/pyspark --packages databricks:tensorframes:0.2.3-s_2.10

    From the demo, this block

    with tf.Graph().as_default() as g:
        x = tfs.block(df, "x")
        z = tf.add(x, 3, name='z')
        df2 = tfs.map_blocks(z, df)
    

    crashes with the following traceback:

    ---------------------------------------------------------------------------
    Py4JJavaError                             Traceback (most recent call last)
    <ipython-input-3-e7ae284146c3> in <module>()
          4     # The TensorFlow placeholder that corresponds to column 'x'.
          5     # The shape of the placeholder is automatically inferred from the DataFrame.
    ----> 6     x = tfs.block(df, "x")
          7     # The output that adds 3 to x
          8     z = tf.add(x, 3, name='z')
    
    /private/var/folders/tb/r74wwyk17b3fn_frdb0gd0780000gn/T/spark-b3c869d7-6d28-4bce-9a2e-d46f43fc83df/userFiles-64edc1b1-03db-40e6-9d7f-f062d0491a77/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/core.py in block(df, col_name, tf_name)
        313     :return: a TensorFlow placeholder.
        314     """
    --> 315     return _auto_placeholder(df, col_name, tf_name, block = True)
        316
        317 def row(df, col_name, tf_name = None):
    
    /private/var/folders/tb/r74wwyk17b3fn_frdb0gd0780000gn/T/spark-b3c869d7-6d28-4bce-9a2e-d46f43fc83df/userFiles-64edc1b1-03db-40e6-9d7f-f062d0491a77/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/core.py in _auto_placeholder(df, col_name, tf_name, block)
        331
        332 def _auto_placeholder(df, col_name, tf_name, block):
    --> 333     info = _java_api().extra_schema_info(df._jdf)
        334     col_shape = [x.shape() for x in info if x.fieldName() == col_name]
        335     if len(col_shape) == 0:
    
    /private/var/folders/tb/r74wwyk17b3fn_frdb0gd0780000gn/T/spark-b3c869d7-6d28-4bce-9a2e-d46f43fc83df/userFiles-64edc1b1-03db-40e6-9d7f-f062d0491a77/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/core.py in _java_api()
         28     # You cannot simply call the creation of the the class on the _jvm due to classloader issues
         29     # with Py4J.
    ---> 30     return _jvm.Thread.currentThread().getContextClassLoader().loadClass(javaClassName) \
         31         .newInstance()
         32
    
    /Users/damien/spark-2.0.2-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py in __call__(self, *args)
       1131         answer = self.gateway_client.send_command(command)
       1132         return_value = get_return_value(
    -> 1133             answer, self.gateway_client, self.target_id, self.name)
       1134
       1135         for temp_arg in temp_args:
    
    /Users/damien/spark-2.0.2-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
         61     def deco(*a, **kw):
         62         try:
    ---> 63             return f(*a, **kw)
         64         except py4j.protocol.Py4JJavaError as e:
         65             s = e.java_exception.toString()
    
    /Users/damien/spark-2.0.2-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
        317                 raise Py4JJavaError(
        318                     "An error occurred while calling {0}{1}{2}.\n".
    --> 319                     format(target_id, ".", name), value)
        320             else:
        321                 raise Py4JError(
    
    Py4JJavaError: An error occurred while calling o47.loadClass.
    : java.lang.NoClassDefFoundError: org/apache/spark/Logging
    	at java.lang.ClassLoader.defineClass1(Native Method)
    	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    	at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    	at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
    	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    	at py4j.Gateway.invoke(Gateway.java:280)
    	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    	at py4j.commands.CallCommand.execute(CallCommand.java:79)
    	at py4j.GatewayConnection.run(GatewayConnection.java:214)
    	at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    	... 22 more
    
    opened by damienstanton 3
  • Spark 2.0.0 + ScalaTest 3.0.0 + updates sbt plugins

    Spark 2.0.0 + ScalaTest 3.0.0 + updates sbt plugins

    The subject says it all.

    WARNING: doit works (since it disabled tests in assembly), but I could not get sbt test working. It fails with the following error which is more about TensorFlow that I know nothing about:

    ➜  tensorframes git:(spark-200-and-other-upgrades) sbt
    [info] Loading global plugins from /Users/jacek/.sbt/0.13/plugins
    [info] Loading project definition from /Users/jacek/dev/oss/tensorframes/project
    [info] Set current project to tensorframes (in build file:/Users/jacek/dev/oss/tensorframes/)
    > testOnly org.tensorframes.dsl.BasicOpsSuite
    16/08/04 23:52:22 DEBUG Paths$: Request for x -> 0
    16/08/04 23:52:22 DEBUG Paths$: Request for y -> 0
    16/08/04 23:52:22 DEBUG Paths$: Request for z -> 0
    
    import tensorflow as tf
    
    x = tf.constant(1, name='x')
    y = tf.constant(2, name='y')
    z = tf.add(x, y, name='z')
    
    g = tf.get_default_graph().as_graph_def()
    for n in g.node:
        print ">>>>>", str(n.name), "<<<<<<"
        print n
    
    [info] BasicOpsSuite:
    [info] - Add *** FAILED ***
    [info]   1 did not equal 0 (1,===========
    [info]   
    [info]   import tensorflow as tf
    [info]   
    [info]   x = tf.constant(1, name='x')
    [info]   y = tf.constant(2, name='y')
    [info]   z = tf.add(x, y, name='z')
    [info]         
    [info]   g = tf.get_default_graph().as_graph_def()
    [info]   for n in g.node:
    [info]       print ">>>>>", str(n.name), "<<<<<<"
    [info]       print n
    [info]          
    [info]   ===========) (ExtractNodes.scala:40)
    [info] Run completed in 1 second, 772 milliseconds.
    [info] Total number of tests run: 1
    [info] Suites: completed 1, aborted 0
    [info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0
    [info] *** 1 TEST FAILED ***
    [error] Failed tests:
    [error]         org.tensorframes.dsl.BasicOpsSuite
    [error] (test:testOnly) sbt.TestsFailedException: Tests unsuccessful
    [error] Total time: 2 s, completed Aug 4, 2016 11:52:22 PM
    

    I'm proposing the PR hoping the issue is a minor one that could easily be fixed with enough guidance.

    opened by jaceklaskowski 3
  • Bump tensorflow from 1.15.0 to 2.9.3 in /python

    Bump tensorflow from 1.15.0 to 2.9.3 in /python

    Bumps tensorflow from 1.15.0 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Support with deep learning model plugging

    Support with deep learning model plugging

    Can you guys help to plug this https://github.com/hongzimao/decima-sim deep learning model into tensorforms? Is it possible to do, any help will be highly appreciated.

    opened by jahidhasanlinix 0
  • Need help with enabling GPUs while predicting through fine-tuned BERT Tensorflow Model on Azure Databricks

    Need help with enabling GPUs while predicting through fine-tuned BERT Tensorflow Model on Azure Databricks

    Hi, I am referring to this code (https://github.com/google-research/bert/blob/master/predicting_movie_reviews_with_bert_on_tf_hub.ipynb for classification) and running it on Azure Databricks Runtime 7.2 ML (includes Apache Spark 3.0.0, GPU, Scala 2.12). I was able to train a model. Although for predictions, I am using a 4 GPU cluster but it is still taking very long time. I suspect that my cluster is not fully utilized and infact still being used as CPU only...Is there anything I need to change to ensure that the GPUs cluster is being utilized and able to function in distributed manner.

    I also referred to Databricks documentation (https://docs.microsoft.com/en-us/azure/databricks/applications/machine-learning/train-model/tensorflow) and did install gpu enabled tensorflow mentioned as:

    %pip install https://databricks-prod-cloudfront.cloud.databricks.com/artifacts/tensorflow/runtime-7.x/tensorflow-1.15.3-cp37-cp37m-linux_x86_64.whl

    But even after that print([tf.version, tf.test.is_gpu_available()]) still shows FALSE as value and no improvement in my cluster utilization Can anyone help on how can i enable full cluster utilization (to worker nodes) for my prediction through fine-tuned bert model?

    I would really appreciate the help.

    opened by samvygupta 0
  • Having java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()

    Having java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()

    Hi I want to use DeepImageFeaturizer combined with spark ML Logistic regression in Spark (2.4.5) / scala 2.11.12 but it's not working. I'm trying to resolve it for many days.

    I have this issue : java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()Lorg/tensorframes/protobuf3shade/ByteString;

    It seems a library is missing but i think I've already referenced all the needed ones :

    delta-core_2.11-0.6.0.jar
    libtensorflow-1.15.0.jar
    libtensorflow_jni-1.15.0.jar
    libtensorflow_jni_gpu-1.15.0.jar
    proto-1.15.0.jar
    scala-logging-api_2.11-2.1.2.jar
    scala-logging-slf4j_2.11-2.1.2.jar
    scala-logging_2.11-3.9.2.jar
    spark-deep-learning-1.5.0-spark2.4-s_2.11.jar
    spark-sql-kafka-0-10_2.11-2.4.5.jar
    spark-tensorflow-connector_2.11-1.6.0.jar
    tensorflow-1.15.0.jar
    tensorflow-hadoop-1.15.0.jar
    tensorframes-0.8.2-s_2.11.jar
    

    Full trace :

    20/05/15 21:17:28 DEBUG impl.TensorFlowOps$: Outputs: Set(InceptionV3_sparkdl_output__)
    Exception in thread "main" java.lang.reflect.InvocationTargetException
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
    	at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
    Caused by: java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()Lorg/tensorframes/protobuf3shade/ByteString;
    	at org.tensorframes.impl.TensorFlowOps$.graphSerial(TensorFlowOps.scala:69)
    	at org.tensorframes.impl.TensorFlowOps$.analyzeGraphTF(TensorFlowOps.scala:114)
    	at org.tensorframes.impl.DebugRowOps.mapRows(DebugRowOps.scala:408)
    	at com.databricks.sparkdl.DeepImageFeaturizer.transform(DeepImageFeaturizer.scala:135)
    	at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:161)
    	at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:149)
    	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
    	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
    	at scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:44)
    	at scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:37)
    	at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:149)
    

    Can someone of the team can tell me what is going wrong ? thanks for your support

    opened by eleite77 0
  • Could not initialize class org.tensorframes.impl.SupportedOperations

    Could not initialize class org.tensorframes.impl.SupportedOperations

    Py4JJavaError: An error occurred while calling o162.analyze. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, 10.244.31.75, executor 1): java.lang.NoClassDefFoundError: Could not initialize class org.tensorframes.impl.SupportedOperations$ at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:148) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.Range.foreach(Range.scala:160) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:95) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185) at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeftOption(TraversableOnce.scala:203) at scala.collection.AbstractIterator.reduceLeftOption(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceOption(TraversableOnce.scala:210) at scala.collection.AbstractIterator.reduceOption(Iterator.scala:1334) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:100) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:93) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

    Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.collect(RDD.scala:944) at org.tensorframes.ExtraOperations$.deepAnalyzeDataFrame(ExperimentalOperations.scala:113) at org.tensorframes.ExperimentalOperations$class.analyze(ExperimentalOperations.scala:41) at org.tensorframes.impl.DebugRowOps.analyze(DebugRowOps.scala:281) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.tensorframes.impl.SupportedOperations$ at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:148) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.Range.foreach(Range.scala:160) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:95) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185) at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeftOption(TraversableOnce.scala:203) at scala.collection.AbstractIterator.reduceLeftOption(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceOption(TraversableOnce.scala:210) at scala.collection.AbstractIterator.reduceOption(Iterator.scala:1334) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:100) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:93) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more

    opened by lee2015new 2
Releases(v0.6.0)
  • v0.6.0(Nov 16, 2018)

  • v0.5.0(Aug 21, 2018)

  • v0.4.0(Jun 18, 2018)

  • v0.2.9(Sep 13, 2017)

    This is the final release for 0.2.9.

    Notable changes since 0.2.8:

    • Upgrades tensorflow dependency from version 1.1.0 to 1.3.0
    • map_blocks, map_row APIs now accept Pandas DataFrames as input
    • Adds support for tensorflow variables. Note that these variables cannot be shared between the worker nodes.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.8(Apr 25, 2017)

    This is the final release for 0.2.8.

    Notable changes since 0.2.5:

    • uses the official java API for tensorflow
    • support for image ingest (see inception example)
    • support for multiple hardware platforms (CPU, GPU) and operating systems (linux, macos). Windows should also work but it has not been tested.
    • support for Spark 2.1.x and Spark 2.2.x
    • some usability and performance fixes, which should give a better experience for users
    • more flexible input names for mapRows.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.8-rc0(Apr 24, 2017)

    This is the first release candidate for 0.2.8.

    Notable changes:

    • uses the official java API for tensorflow
    • support for image ingest (see inception example)
    • support for Spark 2.1.x
    • the same release should support both CPU and GPU clusters
    • some usability and performance fixes, which should give a better experience for users
    Source code(tar.gz)
    Source code(zip)
Owner
Databricks
Helping data teams solve the world’s toughest problems using data and AI
Databricks
Coursera Machine Learning - Python code

Coursera Machine Learning This repository contains python implementations of certain exercises from the course by Andrew Ng. For a number of assignmen

Jordi Warmenhoven 859 Dec 10, 2022
Test symmetries with sklearn decision tree models

Test symmetries with sklearn decision tree models Setup Begin from an environment with a recent version of python 3. source setup.sh Leave the enviro

Rupert Tombs 2 Jul 19, 2022
Accelerating model creation and evaluation.

EmeraldML A machine learning library for streamlining the process of (1) cleaning and splitting data, (2) training, optimizing, and testing various mo

Yusuf 0 Dec 06, 2021
Dual Adaptive Sampling for Machine Learning Interatomic potential.

DAS Dual Adaptive Sampling for Machine Learning Interatomic potential. How to cite If you use this code in your research, please cite this using: Hong

6 Jul 06, 2022
ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

Broad Institute 65 Dec 20, 2022
A repository of PyBullet utility functions for robotic motion planning, manipulation planning, and task and motion planning

pybullet-planning (previously ss-pybullet) A repository of PyBullet utility functions for robotic motion planning, manipulation planning, and task and

Caelan Garrett 260 Dec 27, 2022
Azure MLOps (v2) solution accelerators.

Azure MLOps (v2) solution accelerator Welcome to the MLOps (v2) solution accelerator repository! This project is intended to serve as the starting poi

Microsoft Azure 233 Jan 01, 2023
Graphsignal is a machine learning model monitoring platform.

Graphsignal is a machine learning model monitoring platform. It helps ML engineers, MLOps teams and data scientists to quickly address issues with data and models as well as proactively analyze model

Graphsignal 143 Dec 05, 2022
ML Kaggle Titanic Problem using LogisticRegrission

-ML-Kaggle-Titanic-Problem-using-LogisticRegrission here you will find the solution for the titanic problem on kaggle with comments and step by step c

Mahmoud Nasser Abdulhamed 3 Oct 23, 2022
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

Tamas Madl 482 Nov 19, 2022
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

[Due to the time taken @ uni, work + hell breaking loose in my life, since things have calmed down a bit, will continue commiting!!!] [By the way, I'm

Daniel Han-Chen 1.4k Jan 01, 2023
Python module for machine learning time series:

seglearn Seglearn is a python package for machine learning time series or sequences. It provides an integrated pipeline for segmentation, feature extr

David Burns 536 Dec 29, 2022
Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Oracle 95 Dec 28, 2022
Machine Learning Study 혼자 해보기

Machine Learning Study 혼자 해보기 기여자 (Contributors) ✨ Teddy Lee 🏠 HongJaeKwon 🏠 Seungwoo Han 🏠 Tae Heon Kim 🏠 Steve Kwon 🏠 SW Song 🏠 K1A2 🏠 Wooil

Teddy Lee 1.7k Jan 01, 2023
Estudos e projetos feitos com PySpark.

PySpark (Spark com Python) PySpark é uma biblioteca Spark escrita em Python, e seu objetivo é permitir a análise interativa dos dados em um ambiente d

Karinne Cristina 54 Nov 06, 2022
An easier way to build neural search on the cloud

Jina is geared towards building search systems for any kind of data, including text, images, audio, video and many more. With the modular design & multi-layer abstraction, you can leverage the effici

Jina AI 17k Jan 01, 2023
Decision tree is the most powerful and popular tool for classification and prediction

Diabetes Prediction Using Decision Tree Introduction Decision tree is the most powerful and popular tool for classification and prediction. A Decision

Arjun U 1 Jan 23, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 07, 2023
Datetimes for Humans™

Maya: Datetimes for Humans™ Datetimes are very frustrating to work with in Python, especially when dealing with different locales on different systems

Timo Furrer 3.4k Dec 28, 2022
A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Demand-Forecasting Business Problem A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Ayşe Nur Türkaslan 3 Mar 06, 2022