System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Overview

Validating Simulations of User Query Variants

This repository contains the scripts of the experiments and evaluations, simulated queries, as well as the figures of:

Timo Breuer, Norbert Fuhr, and Philipp Schaer. 2022. Validating Simulations of User Query Variants. In Proceedings of the 44th European Conference on IR Research, ECIR 2022.

System-oriented IR evaluations are limited to rather abstract understandings of real user behavior. As a solution, simulating user interactions provides a cost-efficient way to support system-oriented experiments with more realistic directives when no interaction logs are available. While there are several user models for simulated clicks or result list interactions, very few attempts have been made towards query simulations, and it has not been investigated if these can reproduce properties of real queries. In this work, we validate simulated user query variants with the help of TREC test collections in reference to real user queries that were made for the corresponding topics. Besides, we introduce a simple yet effective method that gives better reproductions of real queries than the established methods. Our evaluation framework validates the simulations regarding the retrieval performance, reproducibility of topic score distributions, shared task utility, effort and effect, and query term similarity when compared with real user query variants. While the retrieval effectiveness and statistical properties of the topic score distributions as well as economic aspects are close to that of real queries, it is still challenging to simulate exact term matches and later query reformulations.

Directory overview

Directory Description
config/ Contains configuration files for the query simulations, experiments, and evaluations.
data/ Contains (intermediate) output data of the simulations and experiments as well as the figures of the paper.
eval/ Contains scripts of the experiments and evaluations.
sim/ Contains scripts of the query simulations.

Setup

  1. Install Anserini and index Core17 (The New York Times Annotated Corpus) according to the regression guide:
anserini/target/appassembler/bin/IndexCollection \
    -collection NewYorkTimesCollection \
    -input /path/to/core17/ \
    -index anserini/indexes/lucene-index.core17 \
    -generator DefaultLuceneDocumentGenerator \
    -threads 4 \
    -storePositions \
    -storeDocvectors \
    -storeRaw \
    -storeContents \
    > anserini/logs/log.core17 &
  1. Install the required Python packages:
pip install -r requirements.txt

Query simulation

In order to prepare the language models and simulate the queries, the scripts have to executed in the order shown in the following table. All of the outputs can be found in the data/ directory. For the sake of better code readability the names of the query reformulation strategies have been mapped: S1S1; S2S2; S2'S3; S3S4; S3'S5; S4S6; S4'S7; S4''S8. The names of the scripts and output files comply with this name mapping.

Script Description Output files
sim/make_background.py Make the background language model form all index terms of Core17. The background model is required for Controlled Query Generation (CQG) by Jordan et al. data/lm/background.csv
sim/make_cqg.py Make the CQG language models with different parameters of lambda from 0.0 to 1.0. data/lm/cqg.json
sim/simulate_queries_s12345.py Simulate TTS and KIS queries with strategies S1 to S3' data/queries/s12345.csv
sim/simulate_queries_s678.py Simulate TTS and KIS queries with strategies S4 to S4'' data/queries/s678.csv

Experimental evaluation and results

In order to reproduce the experiments of the study, the scripts have to executed in the order shown in the following table.

Script Description Output files Reproduction of ...
eval/arp.py, eval/arp_first.py, eval/arp_max.py Retrieval performance: Evaluate the Average Retrieval Performance (ARP). data/experimental_results/arp.csv, data/experimental_results/arp_first.csv, data/experimental_results/arp_max.csv Tab. A.1
eval/rmse_s12345.py, eval/rmse_s678.py Retrieval performance: Evaluate the Root-Mean-Square-Error (RMSE). data/experimental_results/rmse_map.csv, data/experimental_results/rmse_ndcg.csv, data/experimental_results/rmse_p1000.csv, data/experimental_results/rmse_uqv_vs_s12345_kis_ndcg.csv, data/experimental_results/rmse_uqv_vs_s12345_tts_ndcg.csv, data/figures/rmse_map.pdf, data/figures/rmse_ndcg.pdf, data/figures/rmse_p1000.pdf, data/figures/rmse_uqv_vs_s12345_kis_ndcg.pdf, data/figures/rmse_uqv_vs_s12345_tts_ndcg.pdf Fig. A.1, Fig. 1
eval/t-test.py Retrieval performance: Evaluate the p-values of paired t-tests. data/experimental_results/ttest.csv, data/figures/ttest.pdf Fig. A.2
eval/system_orderings.py Shared task utility: Evaluate Kendall's tau between relative system orderings. data/experimental_results/system_orderings.csv, data/figures/system_orderings.pdf Fig. 2 (left)
eval/sdcg.py Effort and effect: Evaluate the Session Discounted Cumulative Gain (sDCG). data/experimental_results/sdcg_3queries.csv, data/experimental_results/sdcg_5queries.csv, data/experimental_results/sdcg_10queries.csv, data/figures/sdcg_3queries.pdf, data/figures/sdcg_5queries.pdf, data/figures/sdcg_10queries.pdf Fig. 3 (top)
eval/economic.py Effort and effect: Evaluate tradeoffs between number of queries and browsing depth by isoquants. data/experimental_results/economic0.3.csv, data/experimental_results/economic0.4.csv, data/experimental_results/economic0.5.csv, data/figures/economic0.3.pdf, data/figures/economic0.4.pdf, data/figures/economic0.5.pdf Fig. 3 (bottom)
eval/jaccard_similarity.py Query term similarity: Evaluate query term similarities. data/experimental_results/jacc.csv, data/figures/jacc.pdf Fig. 2 (right)
Owner
IR Group at Technische Hochschule Köln
IR Group at Technische Hochschule Köln
Instance-based label smoothing for improving deep neural networks generalization and calibration

Instance-based Label Smoothing for Neural Networks Pytorch Implementation of the algorithm. This repository includes a new proposed method for instanc

Mohamed Maher 1 Aug 13, 2022
This is a repo of basic Machine Learning!

Basic Machine Learning This repository contains a topic-wise curated list of Machine Learning and Deep Learning tutorials, articles and other resource

Ekram Asif 53 Dec 31, 2022
The repository contain code for building compiler using puthon.

Building Compiler This is a python implementation of JamieBuild's "Super Tiny Compiler" Overview JamieBuilds developed a wonderfully educative compile

Shyam Das Shrestha 1 Nov 21, 2021
🐦 Quickly annotate data from the comfort of your Jupyter notebook

🐦 pigeon - Quickly annotate data on Jupyter Pigeon is a simple widget that lets you quickly annotate a dataset of unlabeled examples from the comfort

Anastasis Germanidis 647 Jan 05, 2023
Meta-Learning Sparse Implicit Neural Representations (NeurIPS 2021)

Meta-SparseINR Official PyTorch implementation of "Meta-learning Sparse Implicit Neural Representations" (NeurIPS 2021) by Jaeho Lee*, Jihoon Tack*, N

Jaeho Lee 41 Nov 10, 2022
Tensorflow implementation of "Learning Deep Features for Discriminative Localization"

Weakly_detector Tensorflow implementation of "Learning Deep Features for Discriminative Localization" B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and

Taeksoo Kim 363 Jun 29, 2022
Pretrained models for Jax/Haiku; MobileNet, ResNet, VGG, Xception.

Pre-trained image classification models for Jax/Haiku Jax/Haiku Applications are deep learning models that are made available alongside pre-trained we

Alper Baris CELIK 14 Dec 20, 2022
A high-performance anchor-free YOLO. Exceeding yolov3~v5 with ONNX, TensorRT, NCNN, and Openvino supported.

YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities. For more details, please refer to our rep

7.7k Jan 06, 2023
Simulation environments for the CrazyFlie quadrotor: Used for Reinforcement Learning and Sim-to-Real Transfer

Phoenix-Drone-Simulation An OpenAI Gym environment based on PyBullet for learning to control the CrazyFlie quadrotor: Can be used for Reinforcement Le

Sven Gronauer 8 Dec 07, 2022
3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

Владислав Молодцов 0 Feb 06, 2022
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

Mingyang Zhou 28 Dec 30, 2022
ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction. NeurIPS 2021.

Gengshan Yang 59 Nov 25, 2022
Real Time Object Detection and Classification using Yolo Algorithm.

Real time Object detection & Classification using YOLO algorithm. Real Time Object Detection and Classification using Yolo Algorithm. What is Object D

Ketan Chawla 1 Apr 17, 2022
Segmentation-Aware Convolutional Networks Using Local Attention Masks

Segmentation-Aware Convolutional Networks Using Local Attention Masks [Project Page] [Paper] Segmentation-aware convolution filters are invariant to b

144 Jun 29, 2022
A Python Package for Convex Regression and Frontier Estimation

pyStoNED pyStoNED is a Python package that provides functions for estimating multivariate convex regression, convex quantile regression, convex expect

Sheng Dai 17 Jan 08, 2023
The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

[ICLR 2022] The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training The Unreasonable Effectiveness of

VITA 44 Dec 23, 2022
Neural Re-rendering for Full-frame Video Stabilization

NeRViS: Neural Re-rendering for Full-frame Video Stabilization Project Page | Video | Paper | Google Colab Setup Setup environment for [Yu and Ramamoo

Yu-Lun Liu 9 Jun 17, 2022
Self-supervised learning optimally robust representations for domain generalization.

OptDom: Learning Optimal Representations for Domain Generalization This repository contains the official implementation for Optimal Representations fo

Yangjun Ruan 18 Aug 25, 2022
Weakly-supervised object detection.

Wetectron Wetectron is a software system that implements state-of-the-art weakly-supervised object detection algorithms. Project CVPR'20, ECCV'20 | Pa

NVIDIA Research Projects 342 Jan 05, 2023
Voice Gender Recognition

In this project it was used some different Machine Learning models to identify the gender of a voice (Female or Male) based on some specific speech and voice attributes.

Anne Livia 1 Jan 27, 2022