Code repository for EMNLP 2021 paper 'Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods'

Overview

Adversarial Attacks on Knowledge Graph Embeddings
via Instance Attribution Methods

This is the code repository to accompany the EMNLP 2021 paper on adversarial attacks on KGE models.
For any questions or feedback, add an issue or email me at: [email protected]

Overview

The figure illustrates adversarial attacks against KGE models for fraud detection. The knowledge graph consists of two types of entities - Person and BankAccount. The missing target triple to predict is (Sam, allied_with, Joe). Original KGE model predicts this triple as True, i.e. assigns it a higher score relative to synthetic negative triples. But a malicious attacker uses the instance attribution methods to either (a) delete an adversarial triple or (b) add an adversarial triple. Now, the KGE model predicts the missing target triple as False.

The attacker uses the instance attribution methods to identify the training triples that are most influential for model's prediciton on the target triple. These influential triples are used as adversarial deletions. Using the influential triple, the attacker further selects adversarial additions by replacing one of the two entities of the influential triple with the most dissimilar entity in the embedding space. For example, if the attacker identifies that (Sam, deposits_to, Suspicious_Account) is the most influential triple for predicting (Sam, allied_with, Joe), then they can add (Sam, deposits_to, Non_Suspicious_Account) to reduce the influence of the influential triple.

Reproducing the results

Setup

  • python = 3.8.5
  • pytorch = 1.4.0
  • numpy = 1.19.1
  • jupyter = 1.0.0
  • pandas = 1.1.0
  • matplotlib = 3.2.2
  • scikit-learn = 0.23.2
  • seaborn = 0.11.0

Experiments reported in the paper were run in the conda environment attribution_attack.yml.

Steps

  • The codebase and the bash scripts used for experiments are in KGEAttack.
  • To preprocess the original dataset, use the bash script preprocess.sh.
  • For each model-dataset combination, there is a bash script to train the original model, generate attacks from baselines and proposed attacks; and train poisoned model. These scripts are named as model-dataset.sh.
  • The instructions in these scripts are grouped together under the echo statements which indicate what they do.
  • The commandline argument --reproduce-results uses the hyperparameters that were used for the experiments reported in the paper. These hyperparameter values can be inspected in the function set_hyperparams() in utils.py.
  • To reproduce the results, specific instructions from the bash scripts can be run on commandline or the full script can be run.
  • All experiments in the paper were run on a shared HPC cluster that had Nvidia RTX 2080ti, Tesla K40 and V100 GPUs.

References

Parts of this codebase are based on the code from following repositories

Citation

@inproceedings{bhardwaj-etal-2021-adversarial,
    title = "Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods",
    author = "Bhardwaj, Peru  and
      Kelleher, John  and
      Costabello, Luca  and
      O{'}Sullivan, Declan",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.648",
    pages = "8225--8239",
    }
Owner
Peru Bhardwaj
PhD Student, Trinity College Dublin, Ireland.
Peru Bhardwaj
Variational autoencoder for anime face reconstruction

VAE animeface Variational autoencoder for anime face reconstruction Introduction This repository is an exploratory example to train a variational auto

Minzhe Zhang 2 Dec 11, 2021
Part-Aware Data Augmentation for 3D Object Detection in Point Cloud

Part-Aware Data Augmentation for 3D Object Detection in Point Cloud This repository contains a reference implementation of our Part-Aware Data Augment

Jaeseok Choi 62 Jan 03, 2023
Real-time multi-object tracker using YOLO v5 and deep sort

This repository contains a two-stage-tracker. The detections generated by YOLOv5, a family of object detection architectures and models pretrained on the COCO dataset, are passed to a Deep Sort algor

Mike 3.6k Jan 05, 2023
Cooperative multi-agent reinforcement learning for high-dimensional nonequilibrium control

Cooperative multi-agent reinforcement learning for high-dimensional nonequilibrium control Official implementation of: Cooperative multi-agent reinfor

0 Nov 16, 2021
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pytorch Lightning 1.4k Jan 01, 2023
Campsite Reservation Finder

yellowstone-camping UPDATE: yellowstone-camping is being expanded and renamed to camply. The updated tool now interfaces with the Recreation.gov API a

Justin Flannery 233 Jan 08, 2023
Source code for deep symbolic optimization.

Update July 10, 2021: This repository now supports an additional symbolic optimization task: learning symbolic policies for reinforcement learning. Th

Brenden Petersen 290 Dec 25, 2022
MMGeneration is a powerful toolkit for generative models, based on PyTorch and MMCV.

Documentation: https://mmgeneration.readthedocs.io/ Introduction English | 简体中文 MMGeneration is a powerful toolkit for generative models, especially f

OpenMMLab 1.3k Dec 29, 2022
The official GitHub repository for the Argoverse 2 dataset.

Argoverse 2 API Official GitHub repository for the Argoverse 2 family of datasets. If you have any questions or run into any problems with either the

Argo AI 156 Dec 23, 2022
An experimental technique for efficiently exploring neural architectures.

SMASH: One-Shot Model Architecture Search through HyperNetworks An experimental technique for efficiently exploring neural architectures. This reposit

Andy Brock 478 Aug 04, 2022
Code basis for the paper "Camera Condition Monitoring and Readjustment by means of Noise and Blur" (2021)

Camera Condition Monitoring and Readjustment by means of Noise and Blur This repository contains the source code of the paper: Wischow, M., Gallego, G

7 Dec 22, 2022
Deploy pytorch classification model using Flask and Streamlit

Deploy pytorch classification model using Flask and Streamlit

Ben Seo 1 Nov 17, 2021
PartImageNet is a large, high-quality dataset with part segmentation annotations

PartImageNet: A Large, High-Quality Dataset of Parts We will release our dataset and scripts soon after cleaning and approval. Introduction PartImageN

Ju He 77 Nov 30, 2022
Experiment about Deep Person Re-identification with EfficientNet-v2

We evaluated the baseline with Resnet50 and Efficienet-v2 without using pretrained models. Also Resnet50-IBN-A and Efficientnet-v2 using pretrained on ImageNet. We used two datasets: Market-1501 and

lan.nguyen2k 77 Jan 03, 2023
Source code for Acorn, the precision farming rover by Twisted Fields

Acorn precision farming rover This is the software repository for Acorn, the precision farming rover by Twisted Fields. For more information see twist

Twisted Fields 198 Jan 02, 2023
Official PyTorch implementation of "Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient".

Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient This repository is the official PyTorch implementation of "Edge Rewiring Go

Shanchao Yang 4 Dec 12, 2022
Codecov coverage standard for Python

Python-Standard Last Updated: 01/07/22 00:09:25 What is this? This is a Python application, with basic unit tests, for which coverage is uploaded to C

Codecov 10 Nov 04, 2022
Robustness via Cross-Domain Ensembles

Robustness via Cross-Domain Ensembles [ICCV 2021, Oral] This repository contains tools for training and evaluating: Pretrained models Demo code Traini

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 27 Dec 23, 2022
An implementation of chunked, compressed, N-dimensional arrays for Python.

Zarr Latest Release Package Status License Build Status Coverage Downloads Gitter Citation What is it? Zarr is a Python package providing an implement

Zarr Developers 1.1k Dec 30, 2022
A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).

A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).

Yinqiong Cai 189 Dec 28, 2022