NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.

Last update: Dec 28, 2021

Related tags

Deep Learning nuanced

Overview

NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions

Overview

NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns. The dataset focuses on realistic settings where user preferences are extracted from real-world Yelp Open Dataset and paraphrased into natural user responses.

Existing conversational systems are mostly agent-centric, which assumes the user utterances would closely follow the system ontology (for NLU or dialogue state tracking). However, in real-world scenarios, it is highly desirable that the users can speak freely in their own way. It is extremely hard, if not impossible, for the users to adapt to the unknown system ontology.

In this work, we attempt to build a user-centric dialogue system. As there is no clean mapping for a user’s free form utterance to an ontology, we first model the user preferences as estimated distributions over the system ontology and map the users’ utterances to such distributions. Learning such a mapping poses new challenges on reasoning over existing knowledge, ranging from factoid knowledge, commonsense knowledge to the users’ own situations. To this end, we build a new dataset named NUANCED that focuses on such realistic settings for conversational recommendation. We believe NUANCED can serve as a valuable resource to push existing research from the agent-centric system to the user-centric system.

For more details, please refer to the following two papers:
NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions
User Memory Reasoning for Conversational Recommendation

Examples of traditional dataset and NUANCED: in real-world scenarios, the free form user utterances often mismatch with system ontology. In NUANCED, we model the user preferences (or dialogue state) as distributions over the ontology, therefore to allow mapping of entities unknown to the system to multiple values and slots for efficient conversation.

Data

In this data release, we have included both the nuanced version where user preferences are mapped to an estimated distribution and the coarse version where user preferences are mapped to discrete slot labels according to system ontology.

Folder data_dist: the nuanced version;
Folder data_discrete: the coarse version with 0-1 labels;
meta.json: ontology for this restaurant domain;

Format for the dataset: A list of dictionaries, with each dictionary as one dialogue of the following important fields:

"dialogue": a list of dialog turns. Each turn has the following fields:
"role": user or assistant
"text": user utterance or system response
"dialog_acts": acts of this turn
"slots": slots involved in this turn
"dist": for user turn, the preference distribution
"strategy": strategy 1 means the user utterance does not have grounded ontology terms (implicit reasoning), strategy 2 means the user utterance has grounded ontology terms

Citations

If you want to publish experimental results with our datasets or use the baseline models, please cite the following articles (pdf, pdf):

@article{chen2020nuanced,
  title={NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions},
  author={Chen, Zhiyu and Liu, Honglei and Xu, Hu and Moon, Seungwhan and Zhou, Hao and Liu, Bing},
  journal={arXiv preprint arXiv:2010.12758},
  year={2020}
}

@inproceedings{xu2020user,
  title={User Memory Reasoning for Conversational Recommendation},
  author={Xu, Hu and Moon, Seungwhan and Liu, Honglei and Liu, Bing and Shah, Pararth and Philip, S Yu},
  booktitle={Proceedings of the 28th International Conference on Computational Linguistics},
  pages={5288--5308},
  year={2020}
}

License

NUANCED is released under CC-BY-NC-4.0, see LICENSE for details.

NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.

Related tags

Overview

NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions

Overview

Data

Citations

License

Owner

Facebook Research

Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

Code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

Office source code of paper UniFuse: Unidirectional Fusion for 360$^\circ$ Panorama Depth Estimation

I-SECRET: Importance-guided fundus image enhancement via semi-supervised contrastive constraining

Contains code for Deep Kernelized Dense Geometric Matching

[NeurIPS2021] Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

PED: DETR for Crowd Pedestrian Detection

Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

blind SQLIpy sebuah alat injeksi sql yang menggunakan waktu sql untuk mendapatkan sebuah server database.

[제 13회 투빅스 컨퍼런스] OK Mugle! - 장르부터 멜로디까지, Content-based Music Recommendation

hySLAM is a hybrid SLAM/SfM system designed for mapping

Milano is a tool for automating hyper-parameters search for your models on a backend of your choice.

Exposure Time Calculator (ETC) and radial velocity precision estimator for the Near InfraRed Planet Searcher (NIRPS) spectrograph

My 1st place solution at Kaggle Hotel-ID 2021

The dataset of tweets pulling from Twitters with keyword: Hydroxychloroquine, location: US, Time: 2020

Dynamics-aware Adversarial Attack of 3D Sparse Convolution Network

Official Implementation of Few-shot Visual Relationship Co-localization

PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules

Making Structure-from-Motion (COLMAP) more robust to symmetries and duplicated structures