Awesome-NLP-Research (ANLP)

Overview

Awesome-NLP-Research (ANLP)

(Update on 2020-01-10: we have also added the presentations from the Fall 2020 installment of the course. Check for them under "slides2020".)

As part of the Fall 2018 course CPSC 677 "Advanced Natural Language Processing" at Yale, we developed, with the help of the students, a corpus of useful resources for NLP research. Bibliographies and Powerpoint Presentations for each topic are found below, in addition to several blog posts. We asked the students to also list relevant and prerequisite concepts for each topic, and these keywords are found here.

If you have any questions, would like to contribute further to this project or feel we are missing an important citation, please contact Alex Fabbri at alexander[dot]fabbri[at]yale.[first three letters of education]

Overview of papers presented in class

  • Capsule Networks for NLP by Will Merrill - BIB BLOG SLIDES
  • Commonsense Learning by Michihiro Yasunaga - BIB SLIDES
  • Dialogue Systems by Suyi Li - BIB SLIDES
  • Multilingual-Word-Embeddings by Davey Proctor - BIB SLIDES
  • Neural Embeddings By John Brandt - BIB SLIDES
  • Temporal and Dynamic Embeddings by Yavuz Nuzumlali - BIB SLIDES
  • NLP in Finance by Gaurav Pathak BIB SLIDES
  • Natural Language Generation by Tianwei She - BIB SLIDES
  • Knowledge Graphs by Tomoe Mizutani - BIB SLIDES
  • Cross-Lingual Information Retrieval by Rui Zhang - BIB BLOG SLIDES
  • Neural Information Retrieval by Danny Keller - BIB SLIDES
  • Character-Level Language Modeling by Angus Fong - BIB SLIDES
  • Latent Variable Models in NLP by Brian Kitano - BIB SLIDES
  • Unsupervised Machine Translation By Yongjie Lin - BIB SLIDES
  • Neural Computational Morphology by Garrett Bingham - BIB SLIDES
  • Network Methods by Noah Amsel - BIB SLIDES
  • Neural Semi-Supervised Learning by Alex Fabbri - BIB SLIDES
  • Question Answering by Talley Amir - BIB SLIDES
  • Attribute-Level Sentiment Analaysis by Ishita Chakraborty and Davey Proctor - BIB BLOG SLIDES
  • Semantic Parsing by Bo Pang - BIB SLIDES
  • Sequence2Sequence by Jack Koch - BIB SLIDES
  • Seq2SQL by Tao Yu - BIB SLIDES
  • Spectral Learning by Hannah Lawrence - BIB SLIDES
  • Single Document Summarization by Yi Chern Tan - BIB SLIDES
  • Transfer Learning by Irene Li - BIB SLIDES

Additionally, students from the class made blog posts on the following topics:

  • DARTS - BLOG
  • OpenAI Transformer - BLOG
Owner
Language, Information, and Learning at Yale
Language, Information, and Learning at Yale
This is the 25 + 1 year anniversary version of the 1995 Rachford-Rice contest

Rachford-Rice Contest This is the 25 + 1 year anniversary version of the 1995 Rachford-Rice contest. Can you solve the Rachford-Rice problem for all t

13 Sep 20, 2022
The code for two papers: Feedback Transformer and Expire-Span.

transformer-sequential This repo contains the code for two papers: Feedback Transformer Expire-Span The training code is structured for long sequentia

Meta Research 125 Dec 25, 2022
小布助手对话短文本语义匹配的一个baseline

oppo-text-match 小布助手对话短文本语义匹配的一个baseline 模型 参考:https://kexue.fm/archives/8213 base版本线下大概0.952,线上0.866(单模型,没做K-flod融合)。 训练 测试环境:tensorflow 1.15 + keras

苏剑林(Jianlin Su) 132 Dec 14, 2022
Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

Lars Mescheder 884 Nov 11, 2022
In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

Making Emojis More Predictable by Karan Abrol, Karanjot Singh and Pritish Wadhwa, Natural Language Processing (CSE546) under the guidance of Dr. Shad

Karanjot Singh 2 Jan 17, 2022
CMeEE 数据集医学实体抽取

医学实体抽取_GlobalPointer_torch 介绍 思想来自于苏神 GlobalPointer,原始版本是基于keras实现的,模型结构实现参考现有 pytorch 复现代码【感谢!】,基于torch百分百复现苏神原始效果。 数据集 中文医学命名实体数据集 点这里申请,很简单,共包含九类医学

85 Dec 28, 2022
Use fastai-v2 with HuggingFace's pretrained transformers

FastHugs Use fastai v2 with HuggingFace's pretrained transformers, see the notebooks below depending on your task: Text classification: fasthugs_seq_c

Morgan McGuire 111 Nov 16, 2022
Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

Justin Terry 32 Nov 09, 2021
Chinese Grammatical Error Diagnosis

nlp-CGED Chinese Grammatical Error Diagnosis 中文语法纠错研究 基于序列标注的方法 所需环境 Python==3.6 tensorflow==1.14.0 keras==2.3.1 bert4keras==0.10.6 笔者使用了开源的bert4keras

12 Nov 25, 2022
SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning We propose a SASE mode

Tower 1 Nov 20, 2021
Treemap visualisation of Maya scene files

Ever wondered which nodes are responsible for that 600 mb+ Maya scene file? Features Fast, resizable UI Parsing at 50 mb/sec Dependency-free, single-f

Marcus Ottosson 76 Nov 12, 2022
🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt

475 Jan 04, 2023
A natural language modeling framework based on PyTorch

Overview PyText is a deep-learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapi

Meta Research 6.4k Jan 08, 2023
This repo stores the codes for topic modeling on palliative care journals.

This repo stores the codes for topic modeling on palliative care journals. Data Preparation You first need to download the journal papers. bash 1_down

3 Dec 20, 2022
Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

Google 6.4k Jan 01, 2023
a chinese segment base on crf

Genius Genius是一个开源的python中文分词组件,采用 CRF(Conditional Random Field)条件随机场算法。 Feature 支持python2.x、python3.x以及pypy2.x。 支持简单的pinyin分词 支持用户自定义break 支持用户自定义合并词

duanhongyi 237 Nov 04, 2022
Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.

Seq2Seq Speech in JAX A JAX/Flax repository for combining a pre-trained speech encoder model (e.g. Wav2Vec2, HuBERT, WavLM) with a pre-trained text de

Sanchit Gandhi 21 Dec 14, 2022
Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Predicting Yelp Review Quality Table of Contents Introduction Motivation Goal and Central Questions The Data Data Storage and ETL EDA Data Pipeline Da

Jeff Johannsen 3 Nov 27, 2022
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Simplemma: a simple multilingual lemmatizer for Python Purpose Lemmatization is the process of grouping together the inflected forms of a word so they

Adrien Barbaresi 70 Dec 29, 2022
Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

BADER ALABDAN 2 Oct 22, 2022