Debugging, monitoring and visualization for Python Machine Learning and Data Science

Overview

Welcome to TensorWatch

TensorWatch is a debugging and visualization tool designed for data science, deep learning and reinforcement learning from Microsoft Research. It works in Jupyter Notebook to show real-time visualizations of your machine learning training and perform several other key analysis tasks for your models and data.

TensorWatch is designed to be flexible and extensible so you can also build your own custom visualizations, UIs, and dashboards. Besides traditional "what-you-see-is-what-you-log" approach, it also has a unique capability to execute arbitrary queries against your live ML training process, return a stream as a result of the query and view this stream using your choice of a visualizer (we call this Lazy Logging Mode).

TensorWatch is under heavy development with a goal of providing a platform for debugging machine learning in one easy to use, extensible, and hackable package.

TensorWatch in Jupyter Notebook

How to Get It

pip install tensorwatch

TensorWatch supports Python 3.x and is tested with PyTorch 0.4-1.x. Most features should also work with TensorFlow eager tensors. TensorWatch uses graphviz to create network diagrams and depending on your platform sometime you might need to manually install it.

How to Use It

Quick Start

Here's simple code that logs an integer and its square as a tuple every second to TensorWatch:

import tensorwatch as tw
import time

# streams will be stored in test.log file
w = tw.Watcher(filename='test.log')

# create a stream for logging
s = w.create_stream(name='metric1')

# generate Jupyter Notebook to view real-time streams
w.make_notebook()

for i in range(1000):
    # write x,y pair we want to log
    s.write((i, i*i))

    time.sleep(1)

When you run this code, you will notice a Jupyter Notebook file test.ipynb gets created in your script folder. From a command prompt type jupyter notebook and select test.ipynb. Choose Cell > Run all in the menu to see the real-time line graph as values get written in your script.

Here's the output you will see in Jupyter Notebook:

TensorWatch in Jupyter Notebook

To dive deeper into the various other features, please see Tutorials and notebooks.

How does this work?

When you write to a TensorWatch stream, the values get serialized and sent to a TCP/IP socket as well as the file you specified. From Jupyter Notebook, we load the previously logged values from the file and then listen to that TCP/IP socket for any future values. The visualizer listens to the stream and renders the values as they arrive.

Ok, so that's a very simplified description. The TensorWatch architecture is actually much more powerful. Almost everything in TensorWatch is a stream. Files, sockets, consoles and even visualizers are streams themselves. A cool thing about TensorWatch streams is that they can listen to any other streams. This allows TensorWatch to create a data flow graph. This means that a visualizer can listen to many streams simultaneously, each of which could be a file, a socket or some other stream. You can recursively extend this to build arbitrary data flow graphs. TensorWatch decouples streams from how they get stored and how they get visualized.

Visualizations

In the above example, the line graph is used as the default visualization. However, TensorWatch supports many other diagram types including histograms, pie charts, scatter charts, bar charts and 3D versions of many of these plots. You can log your data, specify the chart type you want and let TensorWatch take care of the rest.

One of the significant strengths of TensorWatch is the ability to combine, compose, and create custom visualizations effortlessly. For example, you can choose to visualize an arbitrary number of streams in the same plot. Or you can visualize the same stream in many different plots simultaneously. Or you can place an arbitrary set of visualizations side-by-side. You can even create your own custom visualization widget simply by creating a new Python class, implementing a few methods.

Comparing Results of Multiple Runs

Each TensorWatch stream may contain a metric of your choice. By default, TensorWatch saves all streams in a single file, but you could also choose to save each stream in separate files or not to save them at all (for example, sending streams over sockets or into the console directly, zero hit to disk!). Later you can open these streams and direct them to one or more visualizations. This design allows you to quickly compare the results from your different experiments in your choice of visualizations easily.

Training within Jupyter Notebook

Often you might prefer to do data analysis, ML training, and testing - all from within Jupyter Notebook instead of from a separate script. TensorWatch can help you do sophisticated, real-time visualizations effortlessly from code that is run within a Jupyter Notebook end-to-end.

Lazy Logging Mode

A unique feature in TensorWatch is the ability to query the live running process, retrieve the result of this query as a stream and direct this stream to your preferred visualization(s). You don't need to log any data beforehand. We call this new way of debugging and visualization a lazy logging mode.

For example, as seen below, we visualize input and output image pairs, sampled randomly during the training of an autoencoder on a fruits dataset. These images were not logged beforehand in the script. Instead, the user sends query as a Python lambda expression which results in a stream of images that gets displayed in the Jupyter Notebook:

TensorWatch in Jupyter Notebook

See Lazy Logging Tutorial.

Pre-Training and Post-Training Tasks

TensorWatch leverages several excellent libraries including hiddenlayer, torchstat, Visual Attribution to allow performing the usual debugging and analysis activities in one consistent package and interface.

For example, you can view the model graph with tensor shapes with a one-liner:

Model graph for Alexnet

You can view statistics for different layers such as flops, number of parameters, etc:

Model statistics for Alexnet

See notebook.

You can view the dataset in a lower dimensional space using techniques such as t-SNE:

t-SNE visualization for MNIST

See notebook.

Prediction Explanations

We wish to provide various tools for explaining predictions to help debugging models. Currently, we offer several explainers for convolutional networks, including Lime. For example, the following highlights the areas that cause the Resnet50 model to make a prediction for class 240 for the Imagenet dataset:

CNN prediction explanation

See notebook.

Tutorials

Paper

More technical details are available in TensorWatch paper (EICS 2019 Conference). Please cite this as:

@inproceedings{tensorwatch2019eics,
  author    = {Shital Shah and Roland Fernandez and Steven M. Drucker},
  title     = {A system for real-time interactive analysis of deep learning training},
  booktitle = {Proceedings of the {ACM} {SIGCHI} Symposium on Engineering Interactive
               Computing Systems, {EICS} 2019, Valencia, Spain, June 18-21, 2019},
  pages     = {16:1--16:6},
  year      = {2019},
  crossref  = {DBLP:conf/eics/2019},
  url       = {https://arxiv.org/abs/2001.01215},
  doi       = {10.1145/3319499.3328231},
  timestamp = {Fri, 31 May 2019 08:40:31 +0200},
  biburl    = {https://dblp.org/rec/bib/conf/eics/ShahFD19},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Contribute

We would love your contributions, feedback, questions, and feature requests! Please file a Github issue or send us a pull request. Please review the Microsoft Code of Conduct and learn more.

Contact

Join the TensorWatch group on Facebook to stay up to date or ask any questions.

Credits

TensorWatch utilizes several open source libraries for many of its features. These include: hiddenlayer, torchstat, Visual-Attribution, pyzmq, receptivefield, nbformat. Please see install_requires section in setup.py for upto date list.

License

This project is released under the MIT License. Please review the License file for more details.

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
This tool is designed to help administrators get an overview of their Active Directory structure.

This tool is designed to help administrators get an overview of their Active Directory structure. In the group view you can see all elements of an AD (OU, USER, GROUPS, COMPUTERS etc.). In the user v

deexno 2 Oct 30, 2022
Movies-chart - A CLI app gets the top 250 movies of all time from imdb.com and the top 100 movies from rottentomatoes.com

movies-chart This CLI app gets the top 250 movies of all time from imdb.com and

3 Feb 17, 2022
A Python toolbox for gaining geometric insights into high-dimensional data

"To deal with hyper-planes in a 14 dimensional space, visualize a 3D space and say 'fourteen' very loudly. Everyone does it." - Geoff Hinton Overview

Contextual Dynamics Laboratory 1.8k Dec 29, 2022
Homework 2: Matplotlib and Data Visualization

Homework 2: Matplotlib and Data Visualization Overview These data visualizations were created for my introductory computer science course using Python

Sophia Huang 12 Oct 20, 2022
A collection of 100 Deep Learning images and visualizations

A collection of Deep Learning images and visualizations. The project has been developed by the AI Summer team and currently contains almost 100 images.

AI Summer 65 Sep 12, 2022
Visualise top-rated GitHub repositories in a barchart by keyword

This python script was written for simple purpose -- to visualise top-rated GitHub repositories in a barchart by keyword. Script generates html-page with barchart and information about repository own

Cur1iosity 2 Feb 07, 2022
Python library that makes it easy for data scientists to create charts.

Chartify Chartify is a Python library that makes it easy for data scientists to create charts. Why use Chartify? Consistent input data format: Spend l

Spotify 3.2k Jan 04, 2023
Fast 1D and 2D histogram functions in Python

About Sometimes you just want to compute simple 1D or 2D histograms with regular bins. Fast. No nonsense. Numpy's histogram functions are versatile, a

Thomas Robitaille 237 Dec 18, 2022
An interactive UMAP visualization of the MNIST data set.

Code for an interactive UMAP visualization of the MNIST data set. Demo at https://grantcuster.github.io/umap-explorer/. You can read more about the de

grant 70 Dec 27, 2022
Custom Plotly Dash components based on Mantine React Components library

Dash Mantine Components Dash Mantine Components is a Dash component library based on Mantine React Components Library. It makes it easier to create go

Snehil Vijay 239 Jan 08, 2023
100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)

100 pandas puzzles Puzzles notebook Solutions notebook Inspired by 100 Numpy exerises, here are 100* short puzzles for testing your knowledge of panda

Alex Riley 1.9k Jan 08, 2023
Automate the case review on legal case documents and find the most critical cases using network analysis

Automation on Legal Court Cases Review This project is to automate the case review on legal case documents and find the most critical cases using netw

Yi Yin 7 Dec 28, 2022
哔咔漫画window客户端,界面使用PySide2,已实现分类、搜索、收藏夹、下载、在线观看、waifu2x等功能。

picacomic-windows 哔咔漫画window客户端,界面使用PySide2,已实现分类、搜索、收藏夹、下载、在线观看等功能。 功能介绍 登陆分流,还原安卓端的三个分流入口 分类,搜索,排行,收藏夹使用同一的逻辑,滚轮下滑自动加载下一页,双击打开 漫画详情,章节列表和评论列表 下载功能,目

1.8k Dec 31, 2022
Create SVG drawings from vector geodata files (SHP, geojson, etc).

SVGIS Create SVG drawings from vector geodata files (SHP, geojson, etc). SVGIS is great for: creating small multiples, combining lots of datasets in a

Neil Freeman 78 Dec 09, 2022
Declarative statistical visualization library for Python

Altair http://altair-viz.github.io Altair is a declarative statistical visualization library for Python. With Altair, you can spend more time understa

Altair 8k Jan 05, 2023
a plottling library for python, based on D3

Hello August 2013 Hello! Maybe you're looking for a nice Python interface to build interactive, javascript based plots that look as nice as all those

Mike Dewar 1.4k Dec 28, 2022
Focus on Algorithm Design, Not on Data Wrangling

The dataTap Python library is the primary interface for using dataTap's rich data management tools. Create datasets, stream annotations, and analyze model performance all with one library.

Zensors 37 Nov 25, 2022
Piglet-shaders - PoC of custom shaders for Piglet

Piglet custom shader PoC This is a PoC for compiling Piglet fragment shaders usi

6 Mar 10, 2022
a python function to plot a geopandas dataframe

Pretty GeoDataFrame A minimum python function (~60 lines) to draw pretty geodataframe. Based on matplotlib, shapely, descartes. Installation just use

haoming 27 Dec 05, 2022
Simple plotting for Python. Python wrapper for D3xter - render charts in the browser with simple Python syntax.

PyDexter Simple plotting for Python. Python wrapper for D3xter - render charts in the browser with simple Python syntax. Setup $ pip install PyDexter

D3xter 31 Mar 06, 2021