This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Last update: Nov 22, 2022

Related tags

Deep Learning EMNLP21_SemEq

Overview

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

1. install python environment.

Follow the instruction of "env_install.txt" to create python virtual environment and install necessary packages. The environment is tested on python >=3.6 and pytorch >=1.8.

2. Gloss alignment algorithm.

Change your dictionary data format into the data format of "wordnet_def.txt" in "data/". Run the following commands to get gloss alignment results.

cd run_align_definitions_main/
python ../model/align_definitions_main.py

3. Download the pretrained model and data.

Visit https://drive.google.com/drive/folders/1I5-iOfWr1E32ahYDCbHKCssMdm74_JXG?usp=sharing. Download the pretrained model (SemEq-General-Large which is based on Roberta-Large) and put it under run_robertaLarge_model_span_WSD_twoStageTune/ and also run_robertaLarge_model_span_FEWS_twoStageTune/. Please make sure that the downloaded model file name is "pretrained_model_CrossEntropy.pt". The script will load the general model and fine-tune on specific WSD datasets to get the expert model.

4. Fine-tune the general model to get an expert model (SemEq-Expert-Large).

All-words WSD:

cd run_robertaLarge_model_span_WSD_twoStageTune/
python ../BERT_model_span/BERT_model_main.py --gpu_id 0 --prepare_data True --eval_dataset WSD --exp_mode twoStageTune --optimizer AdamW --learning_rate 2e-6 --bert_model roberta_large --batch_size 16

Few-shot WSD (FEWS):

cd run_robertaLarge_model_span_FEWS_twoStageTune/
python ../BERT_model_span/BERT_model_main.py --gpu_id 0 --prepare_data True --eval_dataset FEWS --exp_mode twoStageTune --optimizer AdamW --learning_rate 5e-6 --bert_model roberta_large --batch_size 16

5. Evaluate results.

All-words WSD: (you can try different epochs)

cd run_robertaLarge_model_span_WSD_twoStageTune/
python ../evaluate/evaluate_WSD.py --loss CrossEntropy --epoch 1
python ../evaluate/evaluate_WSD_POS.py

Few-shot WSD (FEWS): (you can try different epochs)

cd run_robertaLarge_model_span_FEWS_twoStageTune/
python ../evaluate/evaluate_FEWS.py --loss CrossEntropy --epoch 1

Note that the best results of test set on few-shot setting or zero-shot setting are selected based on dev set across epochs, respectively.

Extra. Apply the trained model to any given sentences to do WSD.

After training, you can apply the trained model (trained_model_CrossEntropy.pt) to any sentences. Examples are included in data_custom/. Examples are based on glosses in WordNet3.0.

cd run_BERT_model_span_CustomData/
python ../BERT_model_span/BERT_model_main.py --gpu_id 0 --prepare_data True --eval_dataset custom_data --exp_mode eval --bert_model roberta_large --batch_size 16

If you think this repo is useful, please cite our work. Thanks!

@inproceedings{yao-etal-2021-connect,
    title = "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories",
    author = "Yao, Wenlin  and
      Pan, Xiaoman  and
      Jin, Lifeng  and
      Chen, Jianshu  and
      Yu, Dian  and
      Yu, Dong",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.610",
    pages = "7741--7751",
}

Disclaimer: This repo is only for research purpose. It is not an officially supported Tencent product.

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Related tags

Overview

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories

1. install python environment.

2. Gloss alignment algorithm.

3. Download the pretrained model and data.

4. Fine-tune the general model to get an expert model (SemEq-Expert-Large).

All-words WSD:

Few-shot WSD (FEWS):

5. Evaluate results.

All-words WSD: (you can try different epochs)

Few-shot WSD (FEWS): (you can try different epochs)

Extra. Apply the trained model to any given sentences to do WSD.

Owner

Cross-Document Coreference Resolution

The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA

PyTorch package for the discrete VAE used for DALL·E.

[ECCV'20] Convolutional Occupancy Networks

Supervised Contrastive Learning for Product Matching

Multi-task Multi-agent Soft Actor Critic for SMAC

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, and Trevor Darrell. CVPR 2015 and PAMI 2016.

an implementation of softmax splatting for differentiable forward warping using PyTorch

A Review of Deep Learning Techniques for Markerless Human Motion on Synthetic Datasets

Code used for the results in the paper "ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning"

Let's Git - Versionsverwaltung & Open Source Hausaufgabe

A GUI to automatically create a TOPAS-readable MLC simulation file

HomoInterpGAN - Homomorphic Latent Space Interpolation for Unpaired Image-to-image Translation

PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Code accompanying "Evolving spiking neuron cellular automata and networks to emulate in vitro neuronal activity," accepted to IEEE SSCI ICES 2021

Multi-objective constrained optimization for energy applications via tree ensembles

PySLM Python Library for Selective Laser Melting and Additive Manufacturing

Official PyTorch implementation of "Adversarial Reciprocal Points Learning for Open Set Recognition"

The Environment I built to study Reinforcement Learning + Pokemon Showdown

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Related tags

Overview

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories

1. install python environment.

2. Gloss alignment algorithm.

3. Download the pretrained model and data.

4. Fine-tune the general model to get an expert model (SemEq-Expert-Large).

All-words WSD:

Few-shot WSD (FEWS):

5. Evaluate results.

All-words WSD: (you can try different epochs)

Few-shot WSD (FEWS): (you can try different epochs)

Extra. Apply the trained model to any given sentences to do WSD.

Owner

Cross-Document Coreference Resolution

The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA

PyTorch package for the discrete VAE used for DALL·E.

[ECCV'20] Convolutional Occupancy Networks

Supervised Contrastive Learning for Product Matching

Multi-task Multi-agent Soft Actor Critic for SMAC

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long*, Evan Shelhamer*, and Trevor Darrell. CVPR 2015 and PAMI 2016.

an implementation of softmax splatting for differentiable forward warping using PyTorch

A Review of Deep Learning Techniques for Markerless Human Motion on Synthetic Datasets

Code used for the results in the paper "ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning"

Let's Git - Versionsverwaltung & Open Source Hausaufgabe

A GUI to automatically create a TOPAS-readable MLC simulation file

HomoInterpGAN - Homomorphic Latent Space Interpolation for Unpaired Image-to-image Translation

PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Code accompanying "Evolving spiking neuron cellular automata and networks to emulate in vitro neuronal activity," accepted to IEEE SSCI ICES 2021

Multi-objective constrained optimization for energy applications via tree ensembles

PySLM Python Library for Selective Laser Melting and Additive Manufacturing

Official PyTorch implementation of "Adversarial Reciprocal Points Learning for Open Set Recognition"

The Environment I built to study Reinforcement Learning + Pokemon Showdown

Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, and Trevor Darrell. CVPR 2015 and PAMI 2016.