An Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering

Overview

PC-SOS-SDP: an Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering

PC-SOS-SDP is an exact algorithm based on the branch-and-bound technique for solving the semi-supervised Minimum Sum-of-Squares Clustering (MSSC) problem with pairwise constraints (i.e. must-link and cannot-link constraints) described in the paper "An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering". This repository contains the C++ source code, the MATLAB scripts, and the datasets used for the experiments.

Installation

PC-SOS-SDP calls the semidefinite programming solver SDPNAL+ by using the MATLAB Engine API for C++. It requires the MATLAB engine library libMatlabEngine and the Matlab Data Array library libMatlabDataArray. PC-SOS-SDP calls the integer programming solver Gurobi. PC-SOS-SDP uses the Armadillo library to handle matrices and linear algebra operations efficiently. Before installing Armadillo, first install OpenBLAS and LAPACK along with the corresponding development files. PC-SOS-SDP implements a configurable thread pool of POSIX threads to speed up the branch-and-bound search.

Ubuntu and Debian instructions:

  1. Install MATLAB (>= 2016b)
  2. Install Gurobi (>= 9.0)
  3. Install CMake, OpenBLAS, LAPACK and Armadillo:
sudo apt-get update
sudo apt-get install cmake libopenblas-dev liblapack-dev libarmadillo-dev
  1. Open the makefile clustering_c++/Makefile
    • Set the variable matlab_path with your MATLAB folder.
    • Set the variable gurobi_path with your Gurobi folder.
  2. Compile the code:
cd clustering_c++/
make
  1. Download SDPNAL+, move the folder clustering_matlab containing the MATLAB source code of PC-SOS-SDP in the SDPNAL+ main directory and set the parameter SDP_SOLVER_FOLDER of the configuration file accordingly. This folder and its subfolders will be automatically added to the MATLAB search path when PC-SOS-SDP starts.

The code has been tested on Ubuntu Server 20.04 with MATLAB R2020b, Gurobi 9.2 and Armadillo 10.2.

Configuration

Various parameters used in PC-SOS-SDP can be modified in the configuration file clustering_c++/config.txt:

  • BRANCH_AND_BOUND_TOL - optimality tolerance of the branch-and-bound
  • BRANCH_AND_BOUND_PARALLEL - thread pool size: single thread (1), multi-thread (> 1)
  • BRANCH_AND_BOUND_MAX_NODES - maximum number of nodes
  • BRANCH_AND_BOUND_VISITING_STRATEGY - best first (0), depth first (1), breadth first (2)
  • SDP_SOLVER_SESSION_THREADS_ROOT - number of threads for the MATLAB session at the root
  • SDP_SOLVER_SESSION_THREADS - number of threads for the MATLAB session for the ML and CL nodes
  • SDP_SOLVER_FOLDER - full path of the SDPNAL+ folder
  • SDP_SOLVER_TOL - accuracy of SDPNAL+
  • SDP_SOLVER_VERBOSE - do not display log (0), display log (1)
  • SDP_SOLVER_MAX_CP_ITER_ROOT - maximum number of cutting-plane iterations at the root
  • SDP_SOLVER_MAX_CP_ITER - maximum number of cutting-plane iterations for the ML and CL nodes
  • SDP_SOLVER_CP_TOL - cutting-plane tolerance between two consecutive cutting-plane iterations
  • SDP_SOLVER_MAX_INEQ - maximum number of valid inequalities to add
  • SDP_SOLVER_INHERIT_PERC - fraction of inequalities to inherit
  • SDP_SOLVER_EPS_INEQ - tolerance for checking the violation of the inequalities
  • SDP_SOLVER_EPS_ACTIVE - tolerance for detecting the active inequalities
  • SDP_SOLVER_MAX_PAIR_INEQ - maximum number of pair inequalities to separate
  • SDP_SOLVER_PAIR_PERC - fraction of the most violated pair inequalities to add
  • SDP_SOLVER_MAX_TRIANGLE_INEQ - maximum number of triangle inequalities to separate
  • SDP_SOLVER_TRIANGLE_PERC - fraction of the most violated triangle inequalities to add

Usage

cd clustering_c++/
./bb <DATASET> <K> <CONSTRAINTS> <LOG> <RESULT>
  • DATASET - path of the dataset
  • K - number of clusters
  • CONSTRAINTS - path of the constraints
  • LOG - path of the log file
  • RESULT - path of the optimal cluster assignment matrix

File DATASET contains the data points x_ij and the must include an header line with the problem size n and the dimension d:

n d
x_11 x_12 ... x_1d
x_21 x_22 ... x_2d
...
...
x_n1 x_n2 ... x_nd

File CONSTRAINTS should include indices (i, j) of the data points involved in must-link (ML) and/or cannot-link (CL) constraints:

CL i1 j1
CL i2 j2
...
...
ML i3 j3
ML i4 j4

If it does not contain any constraint (empty file), PC-SOS-SDP becomes SOS-SDP (the exact solver for unsupervised MSSC).

Log

The log file reports the progress of the algorithm:

  • N - size of the current node
  • NODE_PAR - id of the parent node
  • NODE - id of the current node
  • LB_PAR - lower bound of the parent node
  • LB - lower bound of the current node
  • FLAG - termination flag of SDPNAL+
    • 0 - SDP is solved to the required accuracy
    • 1 - SDP is not solved successfully
    • -1, -2, -3 - SDP is partially solved successfully
  • TIME (s) - running time in seconds of the current node
  • CP_ITER - number of cutting-plane iterations
  • CP_FLAG - termination flag of the cutting-plane procedure
    • -3 - current bound is worse than the previous one
    • -2 - SDP is not solved successfully
    • -1 - maximum number of iterations
    • 0 - no violated inequalities
    • 1 - maximum number of inequalities
    • 2 - node must be pruned
    • 3 - cutting-plane tolerance
  • CP_INEQ - number of inequalities added in the last cutting-plane iteration
  • PAIR TRIANGLE CLIQUE - average number of added cuts for each class of inequalities
  • UB - current upper bound
  • GUB - global upper bound
  • I J - current branching decision
  • NODE_GAP - gap at the current node
  • GAP - overall gap
  • OPEN - number of open nodes

Log file example:

DATA_PATH, n, d, k: /home/ubuntu/PC-SOS-SDP/instances/glass.txt 214 9 6
CONSTRAINTS_PATH: /home/ubuntu/PC-SOS-SDP/instances/constraints/glass/ml_50_cl_50_3.txt
LOG_PATH: /home/ubuntu/PC-SOS_SDP/logs/glass/log_ml_50_cl_50_3.txt

BRANCH_AND_BOUND_TOL: 1e-4
BRANCH_AND_BOUND_PARALLEL: 16
BRANCH_AND_BOUND_MAX_NODES: 200
BRANCH_AND_BOUND_VISITING_STRATEGY: 0

SDP_SOLVER_SESSION_THREADS_ROOT: 16
SDP_SOLVER_SESSION_THREADS: 1
SDP_SOLVER_FOLDER: /home/ubuntu/PC-SOS-SDP/SDPNAL+/
SDP_SOLVER_TOL: 1e-05
SDP_SOLVER_VERBOSE: 0
SDP_SOLVER_MAX_CP_ITER_ROOT: 80
SDP_SOLVER_MAX_CP_ITER: 40
SDP_SOLVER_CP_TOL: 1e-06
SDP_SOLVER_MAX_INEQ: 100000
SDP_SOLVER_INHERIT_PERC: 1
SDP_SOLVER_EPS_INEQ: 0.0001
SDP_SOLVER_EPS_ACTIVE: 1e-06
SDP_SOLVER_MAX_PAIR_INEQ: 100000
SDP_SOLVER_PAIR_PERC: 0.05
SDP_SOLVER_MAX_TRIANGLE_INEQ: 100000
SDP_SOLVER_TRIANGLE_PERC: 0.05


|    N| NODE_PAR|    NODE|      LB_PAR|          LB|  FLAG|  TIME (s)| CP_ITER| CP_FLAG|   CP_INEQ|     PAIR  TRIANGLE    CLIQUE|          UB|         GUB|     I      J|     NODE_GAP|          GAP|  OPEN|
|  164|       -1|       0|        -inf|     93.3876|     0|       110|       7|      -3|      6456|  242.571      4802   8.14286|     93.5225|    93.5225*|    -1     -1|   0.00144229|   0.00144229|     0|
|  163|        0|       1|     93.3876|     93.4388|     0|        35|       2|      -3|      5958|        1      3675         0|     93.4777|    93.4777*|    79    142|  0.000416211|  0.000416211|     0|
|  164|        0|       2|     93.3876|     93.4494|     0|        47|       2|      -3|      6888|        0      4635         0|     93.5225|     93.4777|    79    142|  0.000302427|  0.000302427|     0|
|  162|        1|       3|     93.4388|      93.506|     0|        27|       1|       2|      6258|        9      3759         0|         inf|     93.4777|   119    152| -0.000302724| -0.000302724|     0|
|  163|        1|       4|     93.4388|     93.4536|     0|        47|       4|      -3|      3336|        0      1789         0|     93.4777|     93.4777|   119    152|   0.00025747|   0.00025747|     0|
|  164|        2|       5|     93.4494|     93.4549|     0|        37|       1|      -3|      6888|        0      5000         0|     93.5225|     93.4777|    47     54|  0.000243844|  0.000243844|     0|
|  163|        2|       6|     93.4494|     93.4708|     0|        51|       2|       2|      7292|       11      4693         0|     93.5559|     93.4777|    47     54|  7.36443e-05|  7.36443e-05|     0|
|  164|        5|       7|     93.4549|      93.475|     0|        22|       0|       2|      6888|        0         0         0|     93.5225|     93.4777|   122    153|  2.82805e-05|  2.82805e-05|     0|
|  163|        4|       8|     93.4536|     93.4536|     0|        38|       2|      -3|      3257|        0     668.5         0|     93.4704|    93.4704*|    47     54|  0.000180057|  0.000180057|     0|
|  163|        5|       9|     93.4549|     93.5216|     0|        41|       1|       2|      6893|        8      5000         0|         inf|     93.4704|   122    153| -0.000547847| -0.000547847|     0|
|  163|        8|      10|     93.4536|     93.4536|     0|        27|       1|      -3|      3257|        0       879         0|     93.4704|     93.4704|    37     45|  0.000180057|  0.000180057|     0|
|  162|        8|      11|     93.4536|     93.4838|     0|        33|       1|       2|      6158|       24      4233         0|         inf|     93.4704|    37     45| -0.000143677| -0.000143677|     0|
|  162|        4|      12|     93.4536|     93.4658|     0|        75|       5|      -3|      2793|      4.6      2379         0|     93.5111|     93.4704|    47     54|  4.89954e-05|  4.89954e-05|     0|
|  162|       10|      13|     93.4536|     93.5053|     0|        19|       0|       2|      3122|        0         0         0|         inf|     93.4704|    37     99|  -0.00037365|  -0.00037365|     0|
|  163|       10|      14|     93.4536|     93.4701|     0|        31|       0|       2|      3257|        0         0         0|     93.4704|     93.4704|    37     99|  3.13989e-06|  3.13989e-06|     0|

WALL_TIME: 304 sec
N_NODES: 15
AVG_INEQ: 2788.05
AVG_CP_ITER: 1.93333
ROOT_GAP: 0.00144229
GAP: 0
BEST: 93.4704
Owner
Antonio M. Sudoso
Antonio M. Sudoso
Code for EMNLP 2021 paper: "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training"

SCAPT-ABSA Code for EMNLP2021 paper: "Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training" Overvie

Zhengyan Li 66 Dec 04, 2022
Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Re-implementation of the paper 'Grokking: Generalization beyond overfitting on small algorithmic datasets' Paper Original paper can be found here Data

Tom Lieberum 38 Aug 09, 2022
This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

OPTML Group 2 Oct 05, 2022
WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose Yijun Zhou and James Gregson - BMVC2020 Abstract: We present an end-to-end head-pos

368 Dec 26, 2022
Code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection"

CTDNet The PyTorch code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection" Requirements Python 3.6

CVTEAM 28 Oct 20, 2022
Proximal Backpropagation - a neural network training algorithm that takes implicit instead of explicit gradient steps

Proximal Backpropagation Proximal Backpropagation (ProxProp) is a neural network training algorithm that takes implicit instead of explicit gradient s

Thomas Frerix 40 Dec 17, 2022
🗣️ Microsoft Edge TTS for Home Assistant, no need for app_key

Microsoft Edge TTS for Home Assistant This component is based on the TTS service of Microsoft Edge browser, no need to apply for app_key. Install Down

152 Dec 31, 2022
CONditionals for Ordinal Regression and classification in PyTorch

CONDOR pytorch implementation for ordinal regression with deep neural networks. Documentation: https://GarrettJenkinson.github.io/condor_pytorch About

7 Jul 25, 2022
Using machine learning to predict undergrad college admissions.

College-Prediction Project- Overview: Many have tried, many have failed. Few trailblazers are ambitious enought to chase acceptance into the top 15 un

John H Klinges 1 Jan 05, 2022
Lab Materials for MIT 6.S191: Introduction to Deep Learning

This repository contains all of the code and software labs for MIT 6.S191: Introduction to Deep Learning! All lecture slides and videos are available

Alexander Amini 5.6k Dec 26, 2022
Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"

Token Labeling: Training an 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet (arxiv) This is a Pytorch implementation of our te

蒋子航 383 Dec 27, 2022
MVP Benchmark for Multi-View Partial Point Cloud Completion and Registration

MVP Benchmark: Multi-View Partial Point Clouds for Completion and Registration [NEWS] 2021-07-12 [NEW 🎉 ] The submission on Codalab starts! 2021-07-1

PL 93 Dec 21, 2022
Multispectral Object Detection with Yolov5

Multispectral-Object-Detection Intro Official Code for Cross-Modality Fusion Transformer for Multispectral Object Detection. Multispectral Object Dete

Richard Fang 121 Jan 01, 2023
A GUI for Face Recognition, based upon Docker, Tkinter, GPU and a camera device.

Face Recognition GUI This repository is a GUI version of Face Recognition by Adam Geitgey, where e.g. Docker and Tkinter are utilized. All the materia

Kasper Henriksen 6 Dec 05, 2022
An implementation of the "Attention is all you need" paper without extra bells and whistles, or difficult syntax

Simple Transformer An implementation of the "Attention is all you need" paper without extra bells and whistles, or difficult syntax. Note: The only ex

29 Jun 16, 2022
Spherical Confidence Learning for Face Recognition, accepted to CVPR2021.

Sphere Confidence Face (SCF) This repository contains the PyTorch implementation of Sphere Confidence Face (SCF) proposed in the CVPR2021 paper: Shen

Maths 70 Dec 09, 2022
Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning"

A Unified Framework for Parameter-Efficient Transfer Learning This is the official implementation of the paper: Towards a Unified View of Parameter-Ef

Junxian He 216 Dec 29, 2022
PyElecCL - Electron Monte Carlo Second Checks

PyElecCL Python program to perform second checks for electron Monte Carlo radiat

Reese Haywood 3 Feb 22, 2022
GAT - Graph Attention Network (PyTorch) 💻 + graphs + 📣 = ❤️

GAT - Graph Attention Network (PyTorch) 💻 + graphs + 📣 = ❤️ This repo contains a PyTorch implementation of the original GAT paper ( 🔗 Veličković et

Aleksa Gordić 1.9k Jan 09, 2023
MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

MixText This repo contains codes for the following paper: Jiaao Chen, Zichao Yang, Diyi Yang: MixText: Linguistically-Informed Interpolation of Hidden

GT-SALT 309 Dec 12, 2022