LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate.

Overview

LibRerank

LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate. It also supports LambdaMART and DNN as initial ranker.

Get Started

Create virtual environment(optional)

pip install --user virtualenv
~/.local/bin/virtualenv -p python3 ./venv
source venv/bin/activate

Install LibRerank from source

git clone https://github.com/LibRerank-Community/LibRerank.git
cd LibRerank
make init 

Run example

Run initial ranker

bash example/run_ranker.sh

Run re-ranker

bash example/run_reranker.sh

We can choose to enter a config file like example/run_reranker.sh via the parameter setting_path. The config files for the different models can be found in example/config. We can also set various parameters directly from the command line. A list of supported parameters can be found in run_init_ranker.py and run_reranker.py.

Structure

librerank

Initial rankers

DNN: a naive algorithm that directly train a multi-layer perceptron network with input labels (e.g., clicks).

LambdaMART: the implementation of the LambdaMART model in From RankNet to LambdaRank to LambdaMART: An Overview

Re-ranking algorithms

DLCM: the implementation of the Deep Listwise Context Model in Learning a Deep Listwise Context Model for Ranking Refinement.

PRM: the implementation of the Personalized Re-ranking Model in Personalized Re-ranking for Recommendation

GSF: the implementation of the Groupwise Scoring Function in Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks.

miDNN: the implementation of the miDNN model in Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search

SetRank: the implementation of the SetRank model in SetRank: Learning a Permutation-Invariant Ranking Model for Information Retrieval.

Seq2Slate: the implementation of sequence-to-sequence model for re-ranking in Seq2Slate: Re-ranking and Slate Optimization with RNNs

EGRerank: the implementation of the Evaluator-Generator Reranking in AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going Online

Data

We process two datasets, Ad and PRM Public, containing user and item features with recommendation lists for the experimentation with personalized re-ranking. The details of processed datasets are summarized in the following table

Dataset #item #list # user feature # item feature
Ad 349,404 483,049 8 6
PRM Public 2,851,766 1,295,496 3 24

Depending on the length of the initial ranking, the maximum length of initial lists (re-ranking size n) is set to 10 and 30 for Ad and PRM Public, respectively.

Ad

The original Ad dataset records 1 million users and 26 million ad display/click logs, with 8 user profiles (e.g., id, age, and occupation), 6 item features (e.g., id, campaign, and brand). Following previous work, We transform records of each user into ranking lists according to the timestamp of the user browsing the advertisement. Items that have been interacted with within five minutes are sliced into a list and the processed data is avaliable here. The detailed process is here.

PRM public

The original PRM public dataset contains re-ranking lists from a real-world e-commerce RS. Each record is a recommendation list consisting of 3 user profile features, 5 categorical, and 19 dense item features. Due to the memory limitation, we randomly sample 10% of lists and remained data is avaliable here. The detailed process is here.

This is the code repository for LRM Stochastic watershed model.

LRM-Squannacook Input data for generating stochastic streamflows are observed and simulated timeseries of streamflow. their format needs to be CSV wit

1 Feb 14, 2022
LightGBM + Optuna: no brainer

AutoLGBM LightGBM + Optuna: no brainer auto train lightgbm directly from CSV files auto tune lightgbm using optuna auto serve best lightgbm model usin

Rishiraj Acharya 22 Dec 15, 2022
Stacked Generalization (Ensemble Learning)

Stacking (stacked generalization) Overview ikki407/stacking - Simple and useful stacking library, written in Python. User can use models of scikit-lea

Ikki Tanaka 192 Dec 23, 2022
AP1 Transcription Factor Binding Site Prediction

A machine learning project that predicted binding sites of AP1 transcription factor, using ChIP-Seq data and local DNA shape information.

1 Jan 21, 2022
Gaussian Process Optimization using GPy

End of maintenance for GPyOpt Dear GPyOpt community! We would like to acknowledge the obvious. The core team of GPyOpt has moved on, and over the past

Sheffield Machine Learning Software 847 Dec 19, 2022
whylogs: A Data and Machine Learning Logging Standard

whylogs: A Data and Machine Learning Logging Standard whylogs is an open source standard for data and ML logging whylogs logging agent is the easiest

WhyLabs 2k Jan 06, 2023
Implementation of linesearch Optimization Algorithms in Python

Nonlinear Optimization Algorithms During my time as Scientific Assistant at the Karlsruhe Institute of Technology (Germany) I implemented various Opti

Paul 3 Dec 06, 2022
using Machine Learning Algorithm to classification AppleStore application

AppleStore-classification-with-Machine-learning-Algo- using Machine Learning Algorithm to classification AppleStore application. the first step : 1: p

Mohammed Hussien 2 May 02, 2022
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Iterative 19 Oct 03, 2022
Simulate & classify transient absorption spectroscopy (TAS) spectral features for bulk semiconducting materials (Post-DFT)

PyTASER PyTASER is a Python (3.9+) library and set of command-line tools for classifying spectral features in bulk materials, post-DFT. The goal of th

Materials Design Group 4 Dec 27, 2022
Primitives for machine learning and data science.

An Open Source Project from the Data to AI Lab, at MIT MLPrimitives Pipelines and primitives for machine learning and data science. Documentation: htt

MLBazaar 65 Dec 29, 2022
Examples and code for the Practical Machine Learning workshop series

Practical Machine Learning Workshop Series Practical Machine Learning for Quantitative Finance Post conference workshop at the WBS Spring Conference D

CompatibL 21 Jun 25, 2022
A Multipurpose Library for Synthetic Time Series Generation in Python

TimeSynth Multipurpose Library for Synthetic Time Series Please cite as: J. R. Maat, A. Malali, and P. Protopapas, “TimeSynth: A Multipurpose Library

278 Dec 26, 2022
Python Automated Machine Learning library for tabular data.

Simple but powerful Automated Machine Learning library for tabular data. It uses efficient in-memory SAP HANA algorithms to automate routine Data Scie

Daniel Khromov 47 Dec 17, 2022
SynapseML - an open source library to simplify the creation of scalable machine learning pipelines

Synapse Machine Learning SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. Sy

Microsoft 3.9k Dec 30, 2022
My capstone project for Udacity's Machine Learning Nanodegree

MLND-Capstone My capstone project for Udacity's Machine Learning Nanodegree Lane Detection with Deep Learning In this project, I use a deep learning-b

Michael Virgo 407 Dec 12, 2022
It is a forest of random projection trees

rpforest rpforest is a Python library for approximate nearest neighbours search: finding points in a high-dimensional space that are close to a given

Lyst 211 Dec 29, 2022
Predicting India’s COVID-19 Third Wave with LSTM

Predicting India’s COVID-19 Third Wave with LSTM Complete project of predicting new COVID-19 cases in the next 90 days with LSTM India is seeing a ste

Samrat Dutta 4 Jan 27, 2022
DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

27 Aug 19, 2022
ML Kaggle Titanic Problem using LogisticRegrission

-ML-Kaggle-Titanic-Problem-using-LogisticRegrission here you will find the solution for the titanic problem on kaggle with comments and step by step c

Mahmoud Nasser Abdulhamed 3 Oct 23, 2022