The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

Overview

README

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

Introduction

We propose a hierarchical core-fringe learning framework to measure fine-grained domain relevance of terms – the degree that a term is relevant to a broad (e.g., computer science) or narrow (e.g., deep learning) domain.

image-20210528201234901

Requirements

See requirements.txt

To install torch_geometric, please follow the instruction on pytorch_geometric

Reproduction

To reproduce the results in the paper (using word2vec embeddings)

Download data from Google Drive, unzip and put all the folders in the root directory of this repo (details about data are described below)

For broad domains (e.g., CS)

python run.py --domain cs --method cfl

For narrow domains (e.g., ML)

python run.py --domain cs --method hicfl --narrow

For narrow domains (PU setting) (e.g., ML)

python run.py --domain cs --method hicfl --narrow --pu

All experiments are run on an NVIDIA Quadro RTX 5000 with 16GB of memory under the PyTorch framework. The training of CFL for the CS domain can finish in 1 minute.

Query

To handle user query (using compositional GloVe embeddings as an example)

Download data from Google Drive, unzip and put all the folders in the root directory of this repo

Download GloVe embeddings from https://nlp.stanford.edu/projects/glove/, save the file to features/glove.6B.100d.txt

Example:

python query.py --domain cs --method cfl

The first run will train a model and save the model to model/. For the follow-up queries, the trained model can be loaded for prediction.

You can use the model either in a transductive or in an inductive setting (i.e., whether to include the query terms in training).

Options

You can check out the other options available using:

python run.py --help

Data

Data can be downloaded from Google Drive:

term-candidates/: list of seed terms. Format: term frequency

features/: features of terms (term embeddings trained by word2vec). To use compositional GloVe embeddings as features, you can download GloVe embeddings from https://nlp.stanford.edu/projects/glove/. To load the features, refer to utils.py for more details.

wikipedia/: Wikipedia search results for constructing the core-anchored semantic graph / automatic annotation

  • core-categories/: categories of core terms collected from Wikipedia. Format: term catogory ... category

  • gold-subcategories/: gold-subcategories for each domain collected from Wikipedia. Format: level#Category

  • ranking-results/: Wikipedia search results. 0 means using exact match, 1 means without exact match. Format: term result_1 ... result_k.

    The results are collected by the following script:

    # https://pypi.org/project/wikipedia/
    import wikipedia
    def get_wiki_search_result(term, mode=0):
        if mode==0:
            return wikipedia.search(f"\"{term}\"")
        else:
            return wikipedia.search(term)

train-valid-test/: train/valid/test split for evaluation with core terms

manual-data/:

  • ml2000-test.csv: manually created test set for ML
  • domain-relevance-comparison-pairs.csv: manually created test set for domain relevance comparison

Term lists

Several term lists with domain relevance scores produced by CFL/HiCFL are available on term-lists/

Format:

term  domain relevance score  core/fringe

Sample results for Machine Learning:

image-20210528201345177

Citation

The details of this repo are described in the following paper. If you find this repo useful, please kindly cite it:

@inproceedings{huang2021measuring,
  title={Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach},
  author={Huang, Jie and Chang, Kevin Chen-Chuan and Xiong, Jinjun and Hwu, Wen-mei},
  booktitle={Proceedings of ACL-IJCNLP},
  year={2021}
}
Owner
Jie Huang
Jie Huang
Unofficial implementation of the paper: PonderNet: Learning to Ponder in TensorFlow

PonderNet-TensorFlow This is an Unofficial Implementation of the paper: PonderNet: Learning to Ponder in TensorFlow. Official PyTorch Implementation:

1 Oct 23, 2022
Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

ARAE Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun https://arxiv.org/abs/1706.04223 Disc

Junbo (Jake) Zhao 399 Jan 02, 2023
Computer Vision Paper Reviews with Key Summary of paper, End to End Code Practice and Jupyter Notebook converted papers

Computer-Vision-Paper-Reviews Computer Vision Paper Reviews with Key Summary along Papers & Codes. Jonathan Choi 2021 The repository provides 100+ Pap

Jonathan Choi 2 Mar 17, 2022
Deep learning for spiking neural networks

A deep learning library for spiking neural networks. Norse aims to exploit the advantages of bio-inspired neural components, which are sparse and even

Electronic Vision(s) Group — BrainScaleS Neuromorphic Hardware 59 Nov 28, 2022
Nicely is a real-time Feedback and Intervention Program Depression is a prevalent issue across all age groups, socioeconomic classes, and cultural identities.

Nicely is a real-time Feedback and Intervention Program Depression is a prevalent issue across all age groups, socioeconomic classes, and cultural identities.

1 Jan 16, 2022
AgML is a comprehensive library for agricultural machine learning

AgML is a comprehensive library for agricultural machine learning. Currently, AgML provides access to a wealth of public agricultural datasets for common agricultural deep learning tasks.

Plant AI and Biophysics Lab 1 Jul 07, 2022
Malware Env for OpenAI Gym

Malware Env for OpenAI Gym Citing If you use this code in a publication please cite the following paper: Hyrum S. Anderson, Anant Kharkar, Bobby Fila

ENDGAME 563 Dec 29, 2022
2021 National Underwater Robotics Vision Optics

2021-National-Underwater-Robotics-Vision-Optics 2021年全国水下机器人算法大赛-光学赛道-B榜精度第18名 (Kilian_Di的团队:A榜[email pro

Di Chang 9 Nov 04, 2022
571 Dec 25, 2022
Code for the CVPR2021 workshop paper "Noise Conditional Flow Model for Learning the Super-Resolution Space"

NCSR: Noise Conditional Flow Model for Learning the Super-Resolution Space Official NCSR training PyTorch Code for the CVPR2021 workshop paper "Noise

57 Oct 03, 2022
HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision

HugsVision is an open-source and easy to use all-in-one huggingface wrapper for computer vision. The goal is to create a fast, flexible and user-frien

Labrak Yanis 166 Nov 27, 2022
Rainbow is all you need! A step-by-step tutorial from DQN to Rainbow

Do you want a RL agent nicely moving on Atari? Rainbow is all you need! This is a step-by-step tutorial from DQN to Rainbow. Every chapter contains bo

Jinwoo Park (Curt) 1.4k Dec 29, 2022
Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment

Voronoi Multi_Robot Collaborate Exploration Introduction In the unknown environment, the cooperative exploration of multiple robots is completed by Vo

PeaceWord 6 Nov 22, 2022
This is the code for our paper "Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text"

Iconary This is the code for our paper "Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text". It includes the

AI2 6 May 24, 2022
NeurIPS workshop paper 'Counter-Strike Deathmatch with Large-Scale Behavioural Cloning'

Counter-Strike Deathmatch with Large-Scale Behavioural Cloning Tim Pearce, Jun Zhu Offline RL workshop, NeurIPS 2021 Paper: https://arxiv.org/abs/2104

Tim Pearce 169 Dec 26, 2022
A modular domain adaptation library written in PyTorch.

A modular domain adaptation library written in PyTorch.

Kevin Musgrave 225 Dec 29, 2022
Torch-mutable-modules - Use in-place and assignment operations on PyTorch module parameters with support for autograd

Torch Mutable Modules Use in-place and assignment operations on PyTorch module p

Kento Nishi 7 Jun 06, 2022
SynNet - synthetic tree generation using neural networks

SynNet This repo contains the code and analysis scripts for our amortized approach to synthetic tree generation using neural networks. Our model can s

Wenhao Gao 60 Dec 29, 2022
Synthetic LiDAR sequential point cloud dataset with point-wise annotations

SynLiDAR dataset: Learning From Synthetic LiDAR Sequential Point Cloud This is official repository of the SynLiDAR dataset. For technical details, ple

78 Dec 27, 2022
An implementation of the proximal policy optimization algorithm

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 09, 2022