COIN the currently largest dataset for comprehensive instruction video analysis.

Overview

COIN Dataset

COIN is the currently largest dataset for comprehensive instruction video analysis. It contains 11,827 videos of 180 different tasks (i.e., car polishing, make French fries) related to 12 domains (i.e., vehicle, dish). All videos are collected from YouTube and annotated with an efficient toolbox.

Authors and Contributors

Yansong Tang*, Dajun Ding, Yongming Rao*, Yu Zheng*, Danyang Zhang*, Lili Zhao, Jiwen Lu*, Jie Zhou*, Yongxiang Lian*, Yao Li, Jiali Sun, Chang Liu, Dongge You, Zirun Yang, Jiaojiao Ge, Jiayun Wang*

  • *Tsinghua University
  • Meitu Inc.

Contact: [email protected]

License

You may use the codes and files for research only, including sharing and modifying the material. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Dataset and Annotation

Taxonomy

The COIN is organized in a hierarchical structure, which contains three levels: domain, task and step. The corresponding relationship can be found at taxonomy [link]. We provide the taxonomy file of COIN in csv format. Below, we show a small part of the texonomy stored in taxonomy.xlsx:

domain_target_mapping target_action_mapping
Domains Targets
... ...
Vehicle ChangeCarTire
Vehicle InstallLicensePlateFrame
... ...
Gadgets ReplaceCDDriveWithSSD
Target Id Target Label Action Id Action Label
... ... ... ...
13 ChangeCarTire 259 unscrew the screw
13 ChangeCarTire 260 jack up the car
13 ChangeCarTire 261 remove the tire
13 ChangeCarTire 262 put on the tire
13 ChangeCarTire 263 tighten the screws
... ... ... ...

We store the url of video and their annotation in JSON format, which can be accessed with the link [COIN](Project link page). The json file is similar to that of ActivityNet. Below, we show an example entry from the key field "database":

"LtRSn-ntcLY": {
			"duration": 131.0309,
			"class": "ReplaceCDDriveWithSSD",
			"video_url": "https://www.youtube.com/embed/LtRSn-ntcLY",
			"start": 56.640895694775196,
			"annotation": [
				{
					"id": "212",
					"segment": [
						60.0,
						69.0
					],
					"label": "take out the laptop CD drive"
				},
				{
					"id": "216",
					"segment": [
						71.0,
						82.0
					],
					"label": "insert the hard disk tray into the position of the CD drive"
				}
			],
			"subset": "training",
			"end": 85.714362947023,
			"recipe_type": 131
		}

From the entry, we can easily retrieve the Youtube ID, duration, ROI and procedure information of the video. The field "annotation" comprises of a list of all annotated procedures within the video. The field "class" and sub-field "id" correspond to "task" and "step" of the taxonomy respectively.

File Structure

The annotation information is saved in COIN.json.

Field Name Type Example Description
database string - Key filed of the annotation file.
- string LtRSn-ntcLY Youtube ID of the video.
duration float 56.640895694775196 Duration of the video in seconds.
class string ReplaceCDDriveWithSSD Name of the task in the video.
video_url string https://www.youtube.com/embed/LtRSn-ntcLY Url of the video.
start float 56.640895694775196 Start time of the ROI of the video.
end float 85.714362947023 End time of the ROI of the video.
subset string training or validation Subset of the video.
recipe_type int 131 ID number of the task.
annotation string - Annotation information of the video.
annotation:id int 212 ID number of the procedure.
annotation:label string take out the laptop CD drive Name of the procedure.
annotation:segment list of float (len=2) [60.0,69.0] Start and end time of the procedure.
Ensembling Off-the-shelf Models for GAN Training

Data-Efficient GANs with DiffAugment project | paper | datasets | video | slides Generated using only 100 images of Obama, grumpy cats, pandas, the Br

MIT HAN Lab 1.2k Dec 26, 2022
YOLOv5 detection interface - PyQt5 implementation

所有代码已上传,直接clone后,运行yolo_win.py即可开启界面。 2021/9/29:加入置信度选择 界面是在ultralytics的yolov5基础上建立的,界面使用pyqt5实现,内容较简单,娱乐而已。 功能: 模型选择 本地文件选择(视频图片均可) 开关摄像头

487 Dec 27, 2022
Waymo motion prediction challenge 2021: 3rd place solution

Waymo motion prediction challenge 2021: 3rd place solution 📜 Technical report 🗨️ Presentation 🎉 Announcement 🛆Motion Prediction Channel Website 🛆

158 Jan 08, 2023
Python Interview Questions

Python Interview Questions Clone the code to your computer. You need to understand the code in main.py and modify the content in if __name__ =='__main

ClassmateLin 575 Dec 28, 2022
implementation for paper "ShelfNet for fast semantic segmentation"

ShelfNet-lightweight for paper (ShelfNet for fast semantic segmentation) This repo contains implementation of ShelfNet-lightweight models for real-tim

Juntang Zhuang 252 Sep 16, 2022
Code used to generate the results appearing in "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

Train longer, generalize better - Big batch training This is a code repository used to generate the results appearing in "Train longer, generalize bet

Elad Hoffer 145 Sep 16, 2022
Using pytorch to implement unet network for liver image segmentation.

Using pytorch to implement unet network for liver image segmentation.

zxq 1 Dec 17, 2021
YOLO5Face: Why Reinventing a Face Detector (https://arxiv.org/abs/2105.12931)

Introduction Yolov5-face is a real-time,high accuracy face detection. Performance Single Scale Inference on VGA resolution(max side is equal to 640 an

DeepCam Shenzhen 1.4k Jan 07, 2023
The code of paper "Block Modeling-Guided Graph Convolutional Neural Networks".

Block Modeling-Guided Graph Convolutional Neural Networks This repository contains the demo code of the paper: Block Modeling-Guided Graph Convolution

22 Dec 08, 2022
Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

C2-Matching (CVPR2021) This repository contains the implementation of the following paper: Robust Reference-based Super-Resolution via C2-Matching Yum

Yuming Jiang 151 Dec 26, 2022
Collection of TensorFlow2 implementations of Generative Adversarial Network varieties presented in research papers.

TensorFlow2-GAN Collection of tf2.0 implementations of Generative Adversarial Network varieties presented in research papers. Model architectures will

41 Apr 28, 2022
Churn prediction

Churn-prediction Churn-prediction Data preprocessing:: Label encoder is used to normalize the categorical variable Data Transformation:: For each data

1 Sep 28, 2022
A Temporal Extension Library for PyTorch Geometric

Documentation | External Resources | Datasets PyTorch Geometric Temporal is a temporal (dynamic) extension library for PyTorch Geometric. The library

Benedek Rozemberczki 1.9k Jan 07, 2023
official implemntation for "Contrastive Learning with Stronger Augmentations"

CLSA CLSA is a self-supervised learning methods which focused on the pattern learning from strong augmentations. Copyright (C) 2020 Xiao Wang, Guo-Jun

Lab for MAchine Perception and LEarning (MAPLE) 47 Nov 29, 2022
The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

Neural Deformation Graphs Project Page | Paper | Video Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction Aljaž Božič, Pablo P

Aljaz Bozic 134 Dec 16, 2022
Tiny-NewsRec: Efficient and Effective PLM-based News Recommendation

Tiny-NewsRec The source codes for our paper "Tiny-NewsRec: Efficient and Effective PLM-based News Recommendation". Requirements PyTorch == 1.6.0 Tensor

Yang Yu 3 Dec 07, 2022
Audio2Face - Audio To Face With Python

Audio2Face Discription We create a project that transforms audio to blendshape w

FACEGOOD 724 Dec 26, 2022
Python package for downloading ECMWF reanalysis data and converting it into a time series format.

ecmwf_models Readers and converters for data from the ECMWF reanalysis models. Written in Python. Works great in combination with pytesmo. Citation If

TU Wien - Department of Geodesy and Geoinformation 31 Dec 26, 2022
Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

Wasi Ahmad 138 Dec 30, 2022
Self-Supervised Methods for Noise-Removal

SSMNR | Self-Supervised Methods for Noise Removal Image denoising is the task of removing noise from an image, which can be formulated as the task of

1 Jan 16, 2022