Scenic: A Jax Library for Computer Vision and Beyond

Overview

Scenic

Scenic is a codebase with a focus on research around attention-based models for computer vision. Scenic has been successfully used to develop classification, segmentation, and detection models for multiple modalities including images, video, audio, and multimodal combinations of them.

More precisely, Scenic is a (i) set of shared light-weight libraries solving tasks commonly encountered tasks when training large-scale (i.e. multi-device, multi-host) vision models; and (ii) a number of projects containing fully fleshed out problem-specific training and evaluation loops using these libraries.

Scenic is developed in JAX and uses Flax.

What we offer

Among others Scenic provides

  • Boilerplate code for launching experiments, summary writing, logging, profiling, etc;
  • Optimized training and evaluation loops, losses, metrics, bi-partite matchers, etc;
  • Input-pipelines for popular vision datasets;
  • Baseline models, including strong non-attentional baselines.

Papers using Scenic

Scenic can be used to reproduce the results from the following papers, which were either developed using Scenic, or have been reimplemented in Scenic:

Philosophy

Scenic aims to facilitate rapid prototyping of large-scale vision models. To keep the code simple to understand and extend we prefer forking and copy-pasting over adding complexity or increasing abstraction. Only when functionality proves to be widely useful across many models and tasks it may be upstreamed to Scenic's shared libraries.

Code structure

Shared libraries provided by Scenic are split into:

  • dataset_lib: Implements IO pipelines for loading and pre-processing data for common Computer Vision tasks and benchmarks. All pipelines are designed to be scalable and support multi-host and multi-device setups, taking care of dividing data among multiple hosts, incomplete batches, caching, pre-fetching, etc.
  • model_lib: Provides (i) several abstract model interfaces (e.g. ClassificationModel or SegmentationModel in model_lib.base_models) with task-specific losses and metrics; (ii) neural network layers in model_lib.layers, focusing on efficient implementation of attention and transfomer layers; and (iii) accelerator-friedly implementations of bipartite matching algorithms in model_lib.matchers.
  • train_lib: Provides tools for constructing training loops and implements several example trainers (classification trainer and segmentation trainer).
  • common_lib: Utilities that do not belong anywhere else.

Projects

Models built on top of Scenic exist as separate projects. Model-specific code such as configs, layers, losses, network architectures, or training and evaluation loops exist as separate projects.

Common baselines such as a ResNet or a Visual Transformer (ViT) are implemented in the projects/baselines project. Forking this directory is a good starting point for new projects.

There is no one-fits-all recipe for how much code should be re-used by projects. Projects can fall anywhere on the wide spectrum of code re-use: from defining new configs for an existing model to redefining models, training loop, logging, etc.

Getting started

  • See projects/baselines/README.md for a walk-through baseline models and instructions on how to run the code.
  • If you would like to to contribute to Scenic, please check out the Philisophy, Code structure and Contributing sections. Should your contribution be a part of the shared libraries, please send us a pull request!

Quick start

Download the code from GitHub

git clone https://github.com/google-research/scenic.git
cd scenic
pip install .

and run training for ViT on ImageNet:

python main.py -- \
  --config=projects/baselines/configs/imagenet/imagenet_vit_config.py \
  --workdir=./

Disclaimer: This is not an official Google product.

Owner
Google Research
Google Research
Project repo for the paper SILT: Self-supervised Lighting Transfer Using Implicit Image Decomposition

SILT: Self-supervised Lighting Transfer Using Implicit Image Decomposition (BMVC 2021) Project repo for the paper SILT: Self-supervised Lighting Trans

6 Dec 04, 2022
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.

This repository contains the code release for Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. This implementation is written in JAX, and is a fork of Google's JaxNeRF

Google 625 Dec 30, 2022
Unadversarial Examples: Designing Objects for Robust Vision

Unadversarial Examples: Designing Objects for Robust Vision This repository contains the code necessary to replicate the major results of our paper: U

Microsoft 93 Nov 28, 2022
Implementation of a Transformer, but completely in Triton

Transformer in Triton (wip) Implementation of a Transformer, but completely in Triton. I'm completely new to lower-level neural net code, so this repo

Phil Wang 152 Dec 22, 2022
Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

OpenAI 2.9k Jan 04, 2023
AquaTimer - Programmable Timer for Aquariums based on ATtiny414/814/1614

AquaTimer - Programmable Timer for Aquariums based on ATtiny414/814/1614 AquaTimer is a programmable timer for 12V devices such as lighting, solenoid

Stefan Wagner 4 Jun 13, 2022
给yolov5加个gui界面,使用pyqt5,yolov5是5.0版本

博文地址 https://xugaoxiang.com/2021/06/30/yolov5-pyqt5 代码执行 项目中使用YOLOv5的v5.0版本,界面文件是project.ui pip install -r requirements.txt python main.py 图片检测 视频检测

Xu GaoXiang 215 Dec 30, 2022
[ICCV 2021 Oral] Mining Latent Classes for Few-shot Segmentation

Mining Latent Classes for Few-shot Segmentation Lihe Yang, Wei Zhuo, Lei Qi, Yinghuan Shi, Yang Gao. This codebase contains baseline of our paper Mini

Lihe Yang 66 Nov 29, 2022
The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

TriageSQL The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question Intention Classification Benchmark for Text

Yusen Zhang 22 Nov 09, 2022
Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019

Class-Balanced Loss Based on Effective Number of Samples Tensorflow code for the paper: Class-Balanced Loss Based on Effective Number of Samples Yin C

Yin Cui 546 Jan 08, 2023
Auto grind btdb2 exp for tower

Bloons TD Battles 2 EXP Grinder Auto grind btdb2 exp for towers Setup I suggest checking out every screenshot to see what they are supposed to be, so

Vincent 6 Jul 29, 2022
CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

CoReNet CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image. It produces coherent reconstructions, where all objec

Google Research 80 Dec 25, 2022
CS550 Machine Learning course project on CNN Detection.

CNN Detection (CS550 Machine Learning Project) Team Members (Tensor) : Yadava Kishore Chodipilli (11940310) Thashmitha BS (11941250) This is a work do

yaadava_kishore 2 Jan 30, 2022
SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

SEOVER-Master This code is the implementation of paper: SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

4 Feb 24, 2022
VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

VGGVox models for speaker identification and verification This directory contains code to import and evaluate the speaker identification and verificat

338 Dec 27, 2022
Our VMAgent is a platform for exploiting Reinforcement Learning (RL) on Virtual Machine (VM) scheduling tasks.

VMAgent is a platform for exploiting Reinforcement Learning (RL) on Virtual Machine (VM) scheduling tasks. VMAgent is constructed based on one month r

56 Dec 12, 2022
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

268 Jan 09, 2023
Source code to accompany Defunctland's video "FASTPASS: A Complicated Legacy"

Shapeland Simulator Source code to accompany Defunctland's video "FASTPASS: A Complicated Legacy" Download the video at https://www.youtube.com/watch?

TouringPlans.com 70 Dec 14, 2022
[CVPR 2021] Pytorch implementation of Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs In this work, we propose a framework HijackGAN, which enables non-linear latent space travers

Hui-Po Wang 46 Sep 05, 2022
Official repository of my book: "Deep Learning with PyTorch Step-by-Step: A Beginner's Guide"

This is the official repository of my book "Deep Learning with PyTorch Step-by-Step". Here you will find one Jupyter notebook for every chapter in the book.

Daniel Voigt Godoy 340 Jan 01, 2023