Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion

Overview

Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion

Preface

This directory provides an implementation of the algorithms used to compute the hypergeometric tail pseudo-inverse, as well as the code used to produce all figures of the paper "Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion" by Leboeuf, LeBlanc and Marchand.

Installation

To run the scripts, one must first install the package and its requirements. To do so, run the following command from the root directory:

pip install .

Doing so will also provide you with the package hypergeo, which implements an algorithm to compute the hypergeometric tail pseudo-inverses.

Requirements

The code was written to run on Python 3.8 or more recent version. The requirements are shown in the file requirements.txt and can be installed using the command:

pip install -r requirements.txt

The code

The code is split into 2 parts: the 'hypergeo' package and the 'scripts' directory.

The hypergeo package implements the utilities regarding the hypergeometric distribution (to compute the tail and its inverse), the binomial distribution (reimplementing the inverse as the scipy version suffered from numerical unstabilities) and some generalization bounds.

The scripts files produce the figures found in the paper using the hypergeo package. All figures are generated directly in LaTeX using the package python2latex. To run a script, navigate from the command line to the directory root directory of the project and run the command

/ .py" ">
python "./scripts/
     
      /
      
       .py"

      
     

The code does not provide command line control on the parameters of each script. However, each script is fairly simple, and parameters can be directly changed in the __main__ part of the script.

Scripts used in the body of the paper

  • Section 3.3: The ghost sample trade-off. In this section, we claim that optimizing m' gives relative gain between 8% and 10%. To obtain these number, you need to run the file mprime_tradeoff/generate_mprime_data.py to first generate the data, and then run mprime_tradeoff/stats.py.

  • Section 5: Numerical comparison. Figure 1a and 1b are obtain by executing the scripts bounds_comparison/bounds_comparison_risk.py and bounds_comparison/bounds_comparison_d.py respectively. Figure 2a and 2b are obtain by executing the scripts bounds_comparison/bounds_comparison_m.py, the first setting the variable risk to 0, the second by setting it equal to 0.1.

Scripts used in the appendices of the paper

  • Appendix B: Overview of the hypergeometric distribution. Figure 3 is generated from hypergeometric_tail/hyp_tail_plot.py. Figure 4 is generated from hypergeometric_tail/hyp_tail_inv_plot.py. Algorithm 1 is implemented in the hypergeo file hypergeo/hypergeometric_distribution.py as the function hypergeometric_tail_inverse. Algorithm 2 is implemented in the hypergeo file hypergeo/hypergeometric_distribution.py as the function berkopec_hypergeometric_tail_inverse.

  • Appendix D: In-depth analysis of the ghost sample trade-off. Figure 5 is generated from mprime_tradeoff/plot_epsilon_comp.py. Figure 6 is generated from mprime_tradeoff/plot_mprime_best.py.

  • Appendix E: The hypergeometric tail inversion relative deviation bound. To generate Figure 7 and 8, you must first run the file relative_deviation_mprime_tradeoff/mprime_tradeoff_relative_deviation.py to generate the data, then run the script relative_deviation_mprime_tradeoff/plot_epsilon_comp.py to produce Figure 7 and relative_deviation_comparison/plot_mprime_best.py to produce Figure 8.

  • Appendix G: The hypergeometric tail lower bound . Figure 9 is generated from lower_bound/lower_bound_comparison_risk.py.

  • Appendix F: Further numerical comparisons. Figure 10 and 12a are generated from bounds_comparison/bounds_comparison_risk.py by changing the parameters of the scripts. Figure 11 and 12b is generated from bounds_comparison/bounds_comparison_m.py by changing the parameters of the scripts. Figure 13a and 13b are generated from bounds_comparison/sample_compression_comparison_risk.py and bounds_comparison/sample_compression_comparison_m.py respectively.

Other

The script pseudo-inverse_benchmarking/pseudo-inverse_benchmarking.py benchmarks the various algorithms used to invert the hypergeometric tail. The 'tests' directory contains unit tests using the package pytest.

Owner
Jean-Samuel Leboeuf
PhD candidate in Computer Sciences (Machine Learning). MSc in Theoretical Physics.
Jean-Samuel Leboeuf
Hysterese plugin with two temperature offset areas

craftbeerpi4 plugin OffsetHysterese Temperatur-Steuerungs-Plugin mit zwei tempereaturbereich abhängigen Offsets. Installation sudo pip3 install https:

HappyHibo 1 Dec 21, 2021
Code for Fold2Seq paper from ICML 2021

[ICML2021] Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design Environment file: environment.yml Data and Feat

International Business Machines 43 Dec 04, 2022
This repository contains the implementation of the following paper: Cross-Descriptor Visual Localization and Mapping

Cross-Descriptor Visual Localization and Mapping This repository contains the implementation of the following paper: "Cross-Descriptor Visual Localiza

Mihai Dusmanu 81 Oct 06, 2022
PyTorch implementation of hand mesh reconstruction described in CMR and MobRecon.

Hand Mesh Reconstruction Introduction This repo is the PyTorch implementation of hand mesh reconstruction described in CMR and MobRecon. Update 2021-1

Xingyu Chen 236 Dec 29, 2022
Official implementation for NIPS'17 paper: PredRNN: Recurrent Neural Networks for Predictive Learning Using Spatiotemporal LSTMs.

PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning The predictive learning of spatiotemporal sequences aims to generate future

THUML: Machine Learning Group @ THSS 243 Dec 26, 2022
An introduction to bioimage analysis - http://bioimagebook.github.io

Introduction to Bioimage Analysis This book tries explain the main ideas of image analysis in a practical and engaging way. It's written primarily for

Bioimage Book 20 Nov 28, 2022
Deep learning models for change detection of remote sensing images

Change Detection Models (Remote Sensing) Python library with Neural Networks for Change Detection based on PyTorch. ⚡ ⚡ ⚡ I am trying to build this pr

Kaiyu Li 176 Dec 24, 2022
Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

RGBT Crowd Counting Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin. "Cross-Modal Collaborative Representation Learning and a L

37 Dec 08, 2022
Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord.

numpy2tfrecord Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord. Installation

Ryo Yonetani 2 Jan 16, 2022
Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination

Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination Pratul P. Srinivasan, Ben Mildenhall, Matthew Tancik, Jonathan T. Barron,

Pratul Srinivasan 65 Dec 14, 2022
Easy genetic ancestry predictions in Python

ezancestry Easily visualize your direct-to-consumer genetics next to 2500+ samples from the 1000 genomes project. Evaluate the performance of a custom

Kevin Arvai 38 Jan 02, 2023
PoolFormer: MetaFormer is Actually What You Need for Vision

PoolFormer: MetaFormer is Actually What You Need for Vision (arXiv) This is a PyTorch implementation of PoolFormer proposed by our paper "MetaFormer i

Sea AI Lab 1k Dec 30, 2022
Approaches to modeling terrain and maps in python

topography 🌎 Contains different approaches to modeling terrain and topographic-style maps in python Features Inverse Distance Weighting (IDW) A given

John Gutierrez 1 Aug 10, 2022
Official implementation for "QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation" (CVPR 2022)

QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation (CVPR2022) https://arxiv.org/abs/2203.08483 Unpaired image-to-image (I2I

Xueqi Hu 50 Dec 16, 2022
Repo for flood prediction using LSTMs and HAND

Abstract Every year, floods cause billions of dollars’ worth of damages to life, crops, and property. With a proper early flood warning system in plac

1 Oct 27, 2021
Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"

GINC small-scale in-context learning dataset GINC (Generative In-Context learning Dataset) is a small-scale synthetic dataset for studying in-context

P-Lambda 29 Dec 19, 2022
LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021 We propose a cross encoder model (LTR_CrossEncoder) for information retrieval, re-retrie

Xuan Hieu Duong 7 Jan 12, 2022
Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend

Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend This project acts as both a tuto

Guillaume Chevalier 103 Jul 22, 2022
Official Matlab Implementation for "Tiny Obstacle Discovery by Occlusion-aware Multilayer Regression", TIP 2020

Tiny Obstacle Discovery by Occlusion-aware Multilayer Regression Official Matlab Implementation for "Tiny Obstacle Discovery by Occlusion-aware Multil

Xuefeng 5 Jan 15, 2022
This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

Paragraph Aggregation Retrieval Model (PARM) for Dense Document-to-Document Retrieval This repository contains the code for the paper PARM: A Paragrap

Sophia Althammer 33 Aug 26, 2022