This repo holds codes of the ICCV21 paper: Visual Alignment Constraint for Continuous Sign Language Recognition.

Last update: Dec 19, 2022

Overview

VAC_CSLR

This repo holds codes of the paper: Visual Alignment Constraint for Continuous Sign Language Recognition.(ICCV 2021) [paper]

Prerequisites

This project is implemented in Pytorch (>1.8). Thus please install Pytorch first.
ctcdecode==0.4 [parlance/ctcdecode]，for beam search decode.
[Optional] sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite:
ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite We also provide a python version evaluation tool for convenience, but sclite can provide more detailed statistics.
[Optional] SeanNaren/warp-ctc At the beginning of this research, we adopt warp-ctc for supervision, and we recently find that pytorch version CTC can reach similar results.

Data Preparation

Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.
After finishing dataset download, extract it to ./dataset/phoenix, it is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phienix2014
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
```
cd ./preprocess
python data_preprocess.py --process-image --multiprocessing
```

Inference

We provide the pretrained models for inference, you can download them from:

Backbone	WER on Dev	WER on Test	Pretrained model
ResNet18	21.2%	22.3%	[Baidu] (passwd: qi83) [Dropbox]

To evaluate the pretrained model, run the command below：
python main.py --load-weights resnet18_slr_pretrained.pt --phase test

Training

The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model on phoenix14, run the command below:

python main.py --work-dir PATH_TO_SAVE_RESULTS --config PATH_TO_CONFIG_FILE --device AVAILABLE_GPUS

Feature Extraction

We also provide feature extraction function to extract frame-wise features for other research purpose, which can be achieved by:

python main.py --load-weights PATH_TO_PRETRAINED_MODEL --phase features

To Do List

Pure python implemented evaluation tools.
WAR and WER calculation scripts.

Citation

If you find this repo useful in your research works, please consider citing:

@InProceedings{Min_2021_ICCV,
    author    = {Min, Yuecong and Hao, Aiming and Chai, Xiujuan and Chen, Xilin},
    title     = {Visual Alignment Constraint for Continuous Sign Language Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {11542-11551}
}

Relevant paper

Self-Mutual Distillation Learning for Continuous Sign Language Recognition[paper]

@InProceedings{Hao_2021_ICCV,
    author    = {Hao, Aiming and Min, Yuecong and Chen, Xilin},
    title     = {Self-Mutual Distillation Learning for Continuous Sign Language Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {11303-11312}
}

Acknowledge

We appreciate the help from Runpeng Cui, Hao Zhou@Rhythmblue and Xinzhe Han@GeraldHan :)

This repo holds codes of the ICCV21 paper: Visual Alignment Constraint for Continuous Sign Language Recognition.

Related tags

Overview

VAC_CSLR

Prerequisites

Data Preparation

Inference

Training

Feature Extraction

To Do List

Citation

Relevant paper

Acknowledge

Owner

Yuecong Min

Implementation of TabTransformer, attention network for tabular data, in Pytorch

PyTorch implementation of the implicit Q-learning algorithm (IQL)

[Preprint] ConvMLP: Hierarchical Convolutional MLPs for Vision, 2021

Implementation of the HMAX model of vision in PyTorch

Cross-Modal Contrastive Learning for Text-to-Image Generation

Turi Create simplifies the development of custom machine learning models.

Pytorch implementation of the paper "Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization"

Emblaze - Interactive Embedding Comparison

Phylogeny Partners

A Re-implementation of the paper "A Deep Learning Framework for Character Motion Synthesis and Editing"

A module that used for encrypt code which includes RSA and AES

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

List of papers, code and experiments using deep learning for time series forecasting

An Inverse Kinematics library aiming performance and modularity

Deep motion transfer

Weighted QMIX: Expanding Monotonic Value Function Factorisation

A supplementary code for Editable Neural Networks, an ICLR 2020 submission.

Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning

Bootstrapped Representation Learning on Graphs

Pytorch implementation of MLP-Mixer with loading pre-trained models.