This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Overview

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

1. install python environment.

Follow the instruction of "env_install.txt" to create python virtual environment and install necessary packages. The environment is tested on python >=3.6 and pytorch >=1.8.

2. Gloss alignment algorithm.

Change your dictionary data format into the data format of "wordnet_def.txt" in "data/". Run the following commands to get gloss alignment results.

cd run_align_definitions_main/
python ../model/align_definitions_main.py

3. Download the pretrained model and data.

Visit https://drive.google.com/drive/folders/1I5-iOfWr1E32ahYDCbHKCssMdm74_JXG?usp=sharing. Download the pretrained model (SemEq-General-Large which is based on Roberta-Large) and put it under run_robertaLarge_model_span_WSD_twoStageTune/ and also run_robertaLarge_model_span_FEWS_twoStageTune/. Please make sure that the downloaded model file name is "pretrained_model_CrossEntropy.pt". The script will load the general model and fine-tune on specific WSD datasets to get the expert model.

4. Fine-tune the general model to get an expert model (SemEq-Expert-Large).

All-words WSD:

cd run_robertaLarge_model_span_WSD_twoStageTune/
python ../BERT_model_span/BERT_model_main.py --gpu_id 0 --prepare_data True --eval_dataset WSD --exp_mode twoStageTune --optimizer AdamW --learning_rate 2e-6 --bert_model roberta_large --batch_size 16

Few-shot WSD (FEWS):

cd run_robertaLarge_model_span_FEWS_twoStageTune/
python ../BERT_model_span/BERT_model_main.py --gpu_id 0 --prepare_data True --eval_dataset FEWS --exp_mode twoStageTune --optimizer AdamW --learning_rate 5e-6 --bert_model roberta_large --batch_size 16

5. Evaluate results.

All-words WSD: (you can try different epochs)

cd run_robertaLarge_model_span_WSD_twoStageTune/
python ../evaluate/evaluate_WSD.py --loss CrossEntropy --epoch 1
python ../evaluate/evaluate_WSD_POS.py

Few-shot WSD (FEWS): (you can try different epochs)

cd run_robertaLarge_model_span_FEWS_twoStageTune/
python ../evaluate/evaluate_FEWS.py --loss CrossEntropy --epoch 1

Note that the best results of test set on few-shot setting or zero-shot setting are selected based on dev set across epochs, respectively.

Extra. Apply the trained model to any given sentences to do WSD.

After training, you can apply the trained model (trained_model_CrossEntropy.pt) to any sentences. Examples are included in data_custom/. Examples are based on glosses in WordNet3.0.

cd run_BERT_model_span_CustomData/
python ../BERT_model_span/BERT_model_main.py --gpu_id 0 --prepare_data True --eval_dataset custom_data --exp_mode eval --bert_model roberta_large --batch_size 16

If you think this repo is useful, please cite our work. Thanks!

@inproceedings{yao-etal-2021-connect,
    title = "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories",
    author = "Yao, Wenlin  and
      Pan, Xiaoman  and
      Jin, Lifeng  and
      Chen, Jianshu  and
      Yu, Dian  and
      Yu, Dong",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.610",
    pages = "7741--7751",
}

Disclaimer: This repo is only for research purpose. It is not an officially supported Tencent product.

Owner
Research repositories.
Cross-Document Coreference Resolution

Cross-Document Coreference Resolution This repository contains code and models for end-to-end cross-document coreference resolution, as decribed in ou

Arie Cattan 29 Nov 28, 2022
The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

32 Jun 14, 2022
Bytedance Inc. 2.5k Jan 06, 2023
PyTorch package for the discrete VAE used for DALL·E.

Overview [Blog] [Paper] [Model Card] [Usage] This is the official PyTorch package for the discrete VAE used for DALL·E. Installation Before running th

OpenAI 9.5k Jan 05, 2023
[ECCV'20] Convolutional Occupancy Networks

Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page | Blog Post This repository contains the implementation o

622 Dec 30, 2022
Supervised Contrastive Learning for Product Matching

Contrastive Product Matching This repository contains the code and data download links to reproduce the experiments of the paper "Supervised Contrasti

Web-based Systems Group @ University of Mannheim 18 Dec 10, 2022
Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-task Multi-agent Soft Actor Critic for SMAC Overview The CARE formulti-task: Multi-Task Reinforcement Learning with Context-based Representation

RuanJingqing 8 Sep 30, 2022
Fully Convolutional Networks for Semantic Segmentation by Jonathan Long*, Evan Shelhamer*, and Trevor Darrell. CVPR 2015 and PAMI 2016.

Fully Convolutional Networks for Semantic Segmentation This is the reference implementation of the models and code for the fully convolutional network

Evan Shelhamer 3.2k Jan 08, 2023
an implementation of softmax splatting for differentiable forward warping using PyTorch

softmax-splatting This is a reference implementation of the softmax splatting operator, which has been proposed in Softmax Splatting for Video Frame I

Simon Niklaus 338 Dec 28, 2022
A Review of Deep Learning Techniques for Markerless Human Motion on Synthetic Datasets

HOW TO USE THIS PROJECT A Review of Deep Learning Techniques for Markerless Human Motion on Synthetic Datasets Based on DeepLabCut toolbox, we run wit

1 Jan 10, 2022
Code used for the results in the paper "ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning"

Code used for the results in the paper "ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning" Getting started Prerequisites CUD

70 Dec 02, 2022
Let's Git - Versionsverwaltung & Open Source Hausaufgabe

Let's Git - Versionsverwaltung & Open Source Hausaufgabe Herzlich Willkommen zu dieser Hausaufgabe für unseren MOOC: Let's Git! Wir hoffen, dass Du vi

1 Dec 13, 2021
A GUI to automatically create a TOPAS-readable MLC simulation file

Python script to create a TOPAS-readable simulation file descriring a Multi-Leaf-Collimator. Builds the MLC using the data from a 3D .stl file.

Sebastian Schäfer 0 Jun 19, 2022
HomoInterpGAN - Homomorphic Latent Space Interpolation for Unpaired Image-to-image Translation

HomoInterpGAN Homomorphic Latent Space Interpolation for Unpaired Image-to-image Translation (CVPR 2019, oral) Installation The implementation is base

Ying-Cong Chen 99 Nov 15, 2022
PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Adam-NSCL This is a PyTorch implementation of Adam-NSCL algorithm for continual learning from our CVPR2021 (oral) paper: Title: Training Networks in N

Shipeng Wang 34 Dec 21, 2022
Code accompanying "Evolving spiking neuron cellular automata and networks to emulate in vitro neuronal activity," accepted to IEEE SSCI ICES 2021

Evolving-spiking-neuron-cellular-automata-and-networks-to-emulate-in-vitro-neuronal-activity Code accompanying "Evolving spiking neuron cellular autom

SOCRATES: Self-Organizing Computational substRATES 2 Dec 02, 2022
Multi-objective constrained optimization for energy applications via tree ensembles

Multi-objective constrained optimization for energy applications via tree ensembles

C⚙G - Imperial College London 1 Nov 19, 2021
PySLM Python Library for Selective Laser Melting and Additive Manufacturing

PySLM Python Library for Selective Laser Melting and Additive Manufacturing PySLM is a Python library for supporting development of input files used i

Dr Luke Parry 35 Dec 27, 2022
Official PyTorch implementation of "Adversarial Reciprocal Points Learning for Open Set Recognition"

Adversarial Reciprocal Points Learning for Open Set Recognition Official PyTorch implementation of "Adversarial Reciprocal Points Learning for Open Se

Guangyao Chen 78 Dec 28, 2022
The Environment I built to study Reinforcement Learning + Pokemon Showdown

pokemon-showdown-rl-environment The Environment I built to study Reinforcement Learning + Pokemon Showdown Been a while since I ran this. Think it is

3 Jan 16, 2022