Label Studio is a multi-type data labeling and annotation tool with standardized output format

Overview

GitHub label-studio:build GitHub release

WebsiteDocsTwitterJoin Slack Community

What is Label Studio?

Label Studio is an open source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can be used to prepare raw data or improve existing training data to get more accurate ML models.

Gif of Label Studio annotating different types of data

Have a custom dataset? You can customize Label Studio to fit your needs. Read an introductory blog post to learn more.

Try out Label Studio

Install Label Studio locally, or deploy it in a cloud instance. Also you can try Label Studio Teams.

Install locally with Docker

Official Label Studio docker image is here and it can be downloaded with docker pull. Run Label Studio in a Docker container and access it at http://localhost:8080.

docker pull heartexlabs/label-studio:latest
docker run -it -p 8080:8080 -v `pwd`/mydata:/label-studio/data heartexlabs/label-studio:latest

You can find all the generated assets, including SQLite3 database storage label_studio.sqlite3 and uploaded files, in the ./mydata directory.

Override default Docker install

You can override the default launch command by appending the new arguments:

docker run -it -p 8080:8080 -v `pwd`/mydata:/label-studio/data heartexlabs/label-studio:latest label-studio --log-level DEBUG

Build a local image with Docker

If you want to build a local image, run:

docker build -t heartexlabs/label-studio:latest .

Run with Docker Compose

Docker compose script provides production-ready stack consisting of the following components:

  • Label Studio
  • Nginx - proxy web server used to load various static data, including uploaded audio, images, etc.
  • PostgreSQL - production-ready database that replaces less performant SQLite3.

To start using the app from http://localhost run this command:

docker-compose up

Install locally with pip

# Requires >=Python3.6, <3.9
pip install label-studio

# Start the server at http://localhost:8080
label-studio

Install locally with Anaconda

conda create --name label-studio python=3.8
conda activate label-studio
pip install label-studio

Install for local development

You can run the latest Label Studio version locally without installing the package with pip.

# Install all package dependencies
pip install -e .
# Run database migrations
python label_studio/manage.py migrate
# Start the server in development mode at http://localhost:8080
python label_studio/manage.py runserver

Deploy in a cloud instance

You can deploy Label Studio with one click in Heroku, Microsoft Azure, or Google Cloud Platform:

Apply frontend changes

The frontend part of Label Studio app lies in the frontend/ folder and written in React JSX. In case you've made some changes there, the following commands should be run before building / starting the instance:

cd label_studio/frontend/
npm ci
npx webpack
cd ../..
python label_studio/manage.py collectstatic --no-input

Troubleshoot installation

If you see any errors during installation, try to rerun the installation

pip install --ignore-installed label-studio

Install dependencies on Windows

To run Label Studio on Windows, download and install the following wheel packages from Gohlke builds to ensure you're using the correct version of Python:

# Upgrade pip 
pip install -U pip

# If you're running Win64 with Python 3.8, install the packages downloaded from Gohlke:
pip install lxml‑4.5.0‑cp38‑cp38‑win_amd64.whl

# Install label studio
pip install label-studio

What you get from Label Studio

Screenshot of Label Studio data manager grid view with images

  • Multi-user labeling sign up and login, when you create an annotation it's tied to your account.
  • Multiple projects to work on all your datasets in one instance.
  • Streamlined design helps you focus on your task, not how to use the software.
  • Configurable label formats let you customize the visual interface to meet your specific labeling needs.
  • Support for multiple data types including images, audio, text, HTML, time-series, and video.
  • Import from files or from cloud storage in Amazon AWS S3, Google Cloud Storage, or JSON, CSV, TSV, RAR, and ZIP archives.
  • Integration with machine learning models so that you can visualize and compare predictions from different models and perform pre-labeling.
  • Embed it in your data pipeline REST API makes it easy to make it a part of your pipeline

Included templates for labeling data in Label Studio

Label Studio includes a variety of templates to help you label your data, or you can create your own using specifically designed configuration language. The most common templates and use cases for labeling include the following cases:

Set up machine learning models with Label Studio

Connect your favorite machine learning model using the Label Studio Machine Learning SDK. Follow these steps:

  1. Start your own machine learning backend server. See more detailed instructions.
  2. Connect Label Studio to the server on the model page found in project settings.

This lets you:

  • Pre-label your data using model predictions.
  • Do online learning and retrain your model while new annotations are being created.
  • Do active learning by labeling only the most complex examples in your data.

Integrate Label Studio with your existing tools

You can use Label Studio as an independent part of your machine learning workflow or integrate the frontend or backend into your existing tools.

Ecosystem

Project Description
label-studio Server, distributed as a pip package
label-studio-frontend React and JavaScript frontend and can run standalone in a web browser or be embedded into your application.
data-manager React and JavaScript frontend for managing data. Includes the Label Studio Frontend. Relies on the label-studio server or a custom backend with the expected API methods.
label-studio-converter Encode labels in the format of your favorite machine learning library
label-studio-transformers Transformers library connected and configured for use with Label Studio

Roadmap

Want to use The Coolest Feature X but Label Studio doesn't support it? Check out our public roadmap!

Citation

@misc{Label Studio,
  title={{Label Studio}: Data labeling software},
  url={https://github.com/heartexlabs/label-studio},
  note={Open source software available from https://github.com/heartexlabs/label-studio},
  author={
    Maxim Tkachenko and
    Mikhail Malyuk and
    Nikita Shevchenko and
    Andrey Holmanyuk and
    Nikolai Liubimov},
  year={2020-2021},
}

License

This software is licensed under the Apache 2.0 LICENSE © Heartex. 2020-2021

Owner
Heartex
Data labeling and exploration tools for Machine Learning
Heartex
This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

haifeng xia 32 Oct 26, 2022
Implementation of Perceiver, General Perception with Iterative Attention in TensorFlow

Perceiver This Python package implements Perceiver: General Perception with Iterative Attention by Andrew Jaegle in TensorFlow. This model builds on t

Rishit Dagli 84 Oct 15, 2022
Repo for code associated with Modeling the Mitral Valve.

Project Title Mitral Valve Getting Started Repo for code associated with Modeling the Mitral Valve. See https://arxiv.org/abs/1902.00018 for preprint,

Alex Kaiser 1 May 17, 2022
Gesture recognition on Event Data

Event based Gesture Recognition Gesture recognition on Event Data usually involv

2 Feb 14, 2022
ML course - EPFL Machine Learning Course, Fall 2021

EPFL Machine Learning Course CS-433 Machine Learning Course, Fall 2021 Repository for all lecture notes, labs and projects - resources, code templates

EPFL Machine Learning and Optimization Laboratory 1k Jan 04, 2023
Texture mapping with variational auto-encoders

vae-textures This is an experiment with using variational autoencoders (VAEs) to perform mesh parameterization. This was also my first project using J

Alex Nichol 41 May 24, 2022
My 1st place solution at Kaggle Hotel-ID 2021

1st place solution at Kaggle Hotel-ID My 1st place solution at Kaggle Hotel-ID to Combat Human Trafficking 2021. https://www.kaggle.com/c/hotel-id-202

Kohei Ozaki 18 Aug 19, 2022
Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Language: 简体中文 | English Introduction This is the code for Multiple Instance Active Learning for Object Detection, CVPR 2021. Installation A Linux pla

Tianning Yuan 269 Dec 21, 2022
Official PyTorch implementation of Spatial Dependency Networks.

Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling Đorđe Miladinović   Aleksandar Stanić   Stefan Bauer   Jürgen Schmid

Djordje Miladinovic 34 Jan 19, 2022
[ICCV' 21] "Unsupervised Point Cloud Pre-training via Occlusion Completion"

OcCo: Unsupervised Point Cloud Pre-training via Occlusion Completion This repository is the official implementation of paper: "Unsupervised Point Clou

Hanchen 204 Dec 24, 2022
a short visualisation script for pyvideo data

PyVideo Speakers A CLI that visualises repeat speakers from events listed in https://github.com/pyvideo/data Not terribly efficient, but you know. Ins

Katie McLaughlin 3 Nov 24, 2021
Fuzzing tool (TFuzz): a fuzzing tool based on program transformation

T-Fuzz T-Fuzz consists of 2 components: Fuzzing tool (TFuzz): a fuzzing tool based on program transformation Crash Analyzer (CrashAnalyzer): a tool th

HexHive 244 Nov 09, 2022
ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

(Comet-) ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs Paper Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jeff Da, Keisuke Sa

AI2 152 Dec 27, 2022
Convert scikit-learn models to PyTorch modules

sk2torch sk2torch converts scikit-learn models into PyTorch modules that can be tuned with backpropagation and even compiled as TorchScript. Problems

Alex Nichol 101 Dec 16, 2022
Denoising Diffusion Probabilistic Models

Denoising Diffusion Probabilistic Models This repo contains code for DDPM training. Based on Denoising Diffusion Probabilistic Models, Improved Denois

Alexander Markov 7 Dec 15, 2022
High-performance moving least squares material point method (MLS-MPM) solver.

High-Performance MLS-MPM Solver with Cutting and Coupling (CPIC) (MIT License) A Moving Least Squares Material Point Method with Displacement Disconti

Yuanming Hu 2.2k Dec 31, 2022
code for our BMVC 2021 paper "HCV: Hierarchy-Consistency Verification for Incremental Implicitly-Refined Classification"

HCV_IIRC code for our BMVC 2021 paper HCV: Hierarchy-Consistency Verification for Incremental Implicitly-Refined Classification by Kai Wang, Xialei Li

kai wang 13 Oct 03, 2022
Video-based open-world segmentation

UVO_Challenge Team Alpes_runner Solutions This is an official repo for our UVO Challenge solutions for Image/Video-based open-world segmentation. Our

Yuming Du 84 Dec 22, 2022
BigbrotherBENL - Face recognition on the Big Brother episodes in Belgium and the Netherlands.

BigbrotherBENL - Face recognition on the Big Brother episodes in Belgium and the Netherlands. Keeping statistics of whom are most visible and recognisable in the series and wether or not it has an im

Frederik 2 Jan 04, 2022
This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

ICCV Workshop 2021 VTGAN This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

Sharif Amit Kamran 25 Dec 08, 2022