LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

Last update: Dec 29, 2022

Overview

LightHuBERT

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

| Github | Huggingface | SUPERB Leaderboard |

The authors' PyTorch implementation and pretrained models of LightHuBERT.

March 2022: release preprint in arXiv and checkpoints in huggingface.

Pre-Trained Models

Model	Pre-Training Dataset	Download Link
LightHuBERT Base	960 hrs LibriSpeech	huggingface: lighthubert/lighthubert_base.pt
LightHuBERT Small	960 hrs LibriSpeech	huggingface: lighthubert/lighthubert_small.pt
LightHuBERT Stage 1	960 hrs LibriSpeech	huggingface: lighthubert/lighthubert_stage1.pt

Actually, the pre-trained is trained in common.fp16: true so that we can perform model inference with fp16 weights.

Requirements and Installation

PyTorch version >= 1.8.1
Python version >= 3.6
numpy version >= 1.19.3
To install lighthubert:

git clone [email protected]:mechanicalsea/lighthubert.git
cd lighthubert
pip install --editable .

Load Pre-Trained Models for Inference

import torch
from lighthubert import LightHuBERT, LightHuBERTConfig

wav_input_16khz = torch.randn(1,10000).cuda()

# load the pre-trained checkpoints
checkpoint = torch.load('/path/to/lighthubert.pt')
cfg = LightHuBERTConfig(checkpoint['cfg']['model'])
cfg.supernet_type = 'base'
model = LightHuBERT(cfg)
model = model.cuda()
model = model.eval()
print(model.load_state_dict(checkpoint['model'], strict=False))

# (optional) set a subnet
subnet = model.supernet.sample_subnet()
model.set_sample_config(subnet)
params = model.calc_sampled_param_num()
print(f"subnet (Params {params / 1e6:.0f}M) | {subnet}")

# extract the the representation of last layer
rep = model.extract_features(wav_input_16khz)[0]

# extract the the representation of each layer
hs = model.extract_features(wav_input_16khz, ret_hs=True)[0]

print(f"Representation at bottom hidden states: {torch.allclose(rep, hs[-1])}")

More examples can be found in our tutorials.

Universal Representation Evaluation on SUPERB

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ project.

Reference

If you find our work is useful in your research, please cite the following paper:

@article{wang2022lighthubert,
  title={{LightHuBERT}: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit {BERT}},
  author={Rui Wang and Qibing Bai and Junyi Ao and Long Zhou and Zhixiang Xiong and Zhihua Wei and Yu Zhang and Tom Ko and Haizhou Li},
  journal={arXiv preprint arXiv:2203.15610},
  year={2022}
}

Contact Information

For help or issues using LightHuBERT models, please submit a GitHub issue.

For other communications related to LightHuBERT, please contact Rui Wang ([email protected]).

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

Related tags

Overview

LightHuBERT

Pre-Trained Models

Requirements and Installation

Load Pre-Trained Models for Inference

Universal Representation Evaluation on SUPERB

License

Reference

Contact Information

Owner

WangRui

NeoPlay is the project dedicated to ESport events.

Yolact-keras实例分割模型在keras当中的实现

BaseCls BaseCls 是一个基于 MegEngine 的预训练模型库，帮助大家挑选或训练出更适合自己科研或者业务的模型结构

Code for Learning Manifold Patch-Based Representations of Man-Made Shapes, in ICLR 2021.

Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality

Cookiecutter PyTorch Lightning

Weighted K Nearest Neighbors (kNN) algorithm implemented on python from scratch.

Autoregressive Models in PyTorch.

[NeurIPS-2020] Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID.

A robotic arm that mimics hand movement through MediaPipe tracking.

Everything you want about DP-Based Federated Learning, including Papers and Code. (Mechanism: Laplace or Gaussian, Dataset: femnist, shakespeare, mnist, cifar-10 and fashion-mnist. )

Code repository for the paper "Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation" with instructions to reproduce the results.

A Pytree Module system for Deep Learning in JAX

A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Computer Vision application in the web

IJON is an annotation mechanism that analysts can use to guide fuzzers such as AFL.

Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

ML for NLP and Computer Vision.

A curated (most recent) list of resources for Learning with Noisy Labels

CTC segmentation python package