Data for "Driving the Herd: Search Engines as Content Influencers" paper

Overview

herding_data

Data for "Driving the Herd: Search Engines as Content Influencers" paper

Dataset description

The collection contains 2250 documents, 30 initial relevant documents (round 0) - located in initial_documents.trectext file. 2100 documents (rounds 1-5) created by competitors. 120 documents are the example documents that were manually promoted in the herding method experiments.

This dataset is divided w.r.t. the different experiments for content effect, described in the paper.

Format: trectext. DOCNO Format: ROUND- - -

Relevance Judgments (qrels):

All documents in the collection were judged for relevance. Annotators were presented with both the title and the description of each TREC topic and were asked to classify a document as relevant if it satisfies the information need stated in the description.

A document judged relevant by less than three annotators was labeled as non-relevant (0). Documents judged relevant by at least three, four or five annotators were labeled as marginally relevant (1), fairly relevant (2) and highly relevant (3), respectively. For each experiment the relevance judgment file has ".rel" suffix.

Quality judgements:

All documents in the collection where judged for quality by five annotators. Annotators were presented with the text of the document and were asked to classify the docuemnt as: (1) Valid, (2) Keyword-stuffed, (3) Spam.

A document is deemed as keyword-stuffed if it contained excessive repetition of words which seemed unnatural or artificially introduced.

A document is considered as spam if its content could not possibly satisfy any information need.

If a document is not spam or keywordstuffed, it is considered as valid. Documents judged valid by at least three, four or five annotators were labeled as marginally high-quality (1), fairly high-quality (2) and highly high-quality (3), respectively. For each experiment the quality judgment file has ".ks" suffix.

Queries

We used 30 of ClueWeb09 queries which can be downloded here: http://trec.nist.gov/data/webmain.html.

Example documents

In the herding method experiment for each query and effect an exapmle document, manifesting the desired content effect, was manually promoted to 1'st place. For each effect the example documents are located at "herding__example_documents.trectext" file. The format of document names is: DOCNO Format: ROUND-00- -EXAMPLEDOC

Subtopic effect experiment

This content effect was tested both in terms of herding and biasing approaches. For each query 2 different subtopics were tested. The subtopics were taken from ClueWeb09 subtopics list. The mapping between qid and the subtopic number which was promoted (and the actual information need manifested by the subtopic) is located at _subtopics_map.txt files (in each relevant directory separetly).

We include relevance judgemnts for each document (competing for a rankings w.r.t a query) w.r.t. to both subtopics promoted for the query. Please note that each document was tested w.r.t. a single subtopic (can be induced by the mapping file) during the experiment. The judgments are for both subtopics for analysis porpuses only. Relevance judgments w.r.t. subtopics name is " _relevance_to_subptopic.rel".

The qrels format is: " ".

Directories

Herding

Document_length_effect

The data contained in this directory is related to the documents created in the document length effect experiment (herding method).

Non_relevance_effect

The data contained in this directory is related to the documents created in the non-relevance effect experiment (herding method).

Query_terms_effect

The data contained in this directory is related to the documents created in the query terms effect experiment (herding method).

Subtopic_effect

The data contained in this directory is related to the documents created in the subtopic effect experiment (herding method).

Biasing

Subtopic_effect

The data contained in this directory is related to the documents created in the subtopic effect experiment (biasing method).

Control

The data contained in this directory is related to the documents created in the control group. That is, no expore of any kind of manipulation for this group.

Dummies

The data contained in this directory is related to the documents taken from Raifer et al '17 dataset. Dummies with docnos "DUMMY_{0,1}" where shared over all groups.

Control group and biasing groups where filled with DUMMY_2 dummies (in the docno) as well.

R interface to fast.ai

R interface to fastai The fastai package provides R wrappers to fastai. The fastai library simplifies training fast and accurate neural nets using mod

113 Dec 20, 2022
A super lightweight Lagrangian model for calculating millions of trajectories using ERA5 data

Easy-ERA5-Trck Easy-ERA5-Trck Galleries Install Usage Repository Structure Module Files Version iteration Easy-ERA5-Trck is a super lightweight Lagran

Zhenning Li 26 Nov 19, 2022
Code for paper "Learning to Reweight Examples for Robust Deep Learning"

learning-to-reweight-examples Code for paper Learning to Reweight Examples for Robust Deep Learning. [arxiv] Environment We tested the code on tensorf

Uber Research 261 Jan 01, 2023
All the essential resources and template code needed to understand and practice data structures and algorithms in python with few small projects to demonstrate their practical application.

Data Structures and Algorithms Python INDEX 1. Resources - Books Data Structures - Reema Thareja competitiveCoding Big-O Cheat Sheet DAA Syllabus Inte

Shushrut Kumar 129 Dec 15, 2022
A Python library for common tasks on 3D point clouds

Point Cloud Utils (pcu) - A Python library for common tasks on 3D point clouds Point Cloud Utils (pcu) is a utility library providing the following fu

Francis Williams 622 Dec 27, 2022
Fuzzy Overclustering (FOC)

Fuzzy Overclustering (FOC) In real-world datasets, we need consistent annotations between annotators to give a certain ground-truth label. However, in

2 Nov 08, 2022
Created as part of CS50 AI's coursework. This AI makes use of knowledge entailment to calculate the best probabilities to win Minesweeper.

Minesweeper-AI Created as part of CS50 AI's coursework. This AI makes use of knowledge entailment to calculate the best probabilities to win Minesweep

Beckham 0 Jul 20, 2022
[ACM MM 2019 Oral] Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation

Contents Cycle-In-Cycle GANs Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Acknowledgments Relat

Hao Tang 67 Dec 14, 2022
Jremesh-tools - Blender addon for quad remeshing

JRemesh Tools Blender 2.8 - 3.x addon for quad remeshing. Currently it is a wrap

Jayanam 89 Dec 30, 2022
PiRapGenerator - Make anyone rap the digits of pi

PiRapGenerator Make anyone rap the digits of pi (sample files are of Ted Nivison

7 Oct 02, 2022
[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

VITA 112 Nov 07, 2022
This is the repository of our article published on MDPI Entropy "Feature Selection for Recommender Systems with Quantum Computing".

Collaborative-driven Quantum Feature Selection This repository was developed by Riccardo Nembrini, PhD student at Politecnico di Milano. See the websi

Quantum Computing Lab @ Politecnico di Milano 10 Apr 21, 2022
Kalidokit is a blendshape and kinematics solver for Mediapipe/Tensorflow.js face, eyes, pose, and hand tracking models

Blendshape and kinematics solver for Mediapipe/Tensorflow.js face, eyes, pose, and hand tracking models.

Rich 4.5k Jan 07, 2023
GARCH and Multivariate LSTM forecasting models for Bitcoin realized volatility with potential applications in crypto options trading, hedging, portfolio management, and risk management

Bitcoin Realized Volatility Forecasting with GARCH and Multivariate LSTM Author: Chi Bui This Repository Repository Directory ├── README.md

Chi Bui 113 Dec 29, 2022
SIMULEVAL A General Evaluation Toolkit for Simultaneous Translation

SimulEval SimulEval is a general evaluation framework for simultaneous translation on text and speech. Requirement python = 3.7.0 Installation git cl

Facebook Research 48 Dec 28, 2022
Invertible conditional GANs for image editing

Invertible Conditional GANs This is the implementation of the IcGAN model proposed in our paper: Invertible Conditional GANs for image editing. Novemb

Guim 278 Dec 12, 2022
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers Results results on COCO val Backbone Method Lr Schd PQ Config Download

155 Dec 20, 2022
A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Note: This is an alpha (preview) version which is still under refining. nn-Meter is a novel and efficient system to accurately predict the inference l

Microsoft 244 Jan 06, 2023
VLGrammar: Grounded Grammar Induction of Vision and Language

VLGrammar: Grounded Grammar Induction of Vision and Language

Yining Hong 27 Dec 23, 2022
Spatial-Location-Constraint-Prototype-Loss-for-Open-Set-Recognition

Spatial Location Constraint Prototype Loss for Open Set Recognition Official PyTorch implementation of "Spatial Location Constraint Prototype Loss for

Xia Ziheng 12 Jun 24, 2022