当前位置:网站首页>Stetman读paper小记:Backdoor Learning: A Survey(Yiming Li, Yong Jiang, Zhifeng Li, Shu-Tao Xia)
Stetman读paper小记:Backdoor Learning: A Survey(Yiming Li, Yong Jiang, Zhifeng Li, Shu-Tao Xia)
2022-08-09 14:57:00 【Stetman】
Contents:
1.Introduction and some preliminaries
2.Classical Scenarios and Corresponding Capacities
3.Backdoor Attacks
4.Backdoor Defence
5.Future Directions
Introduction and some preliminaries
Introduction of BackDoor Learning
文章中简述了后门攻击:在训练过程中会攻击者在DNN中植入后门,使被攻击的DNN在良性样本上表现正常,而如果后门触发模式被激活,DNN的预测将会被恶意地不断改变。目前最主流最直接的方式就是通过中毒训练集(如:添加触发器)向DNN中植入后门,如图所示。除了直接毒害训练样本外,还可以通过迁移学习,直接修改模型参数,并添加额外的恶意模块来嵌入隐藏的后门。
some preliminaries of BackDoor Learning
原文给出了一些相关概念和解释
• Benign model refers to the model trained under benign
settings.
• Infected model refers to the model with hidden back-
door(s).
• Poisoned sample is the modified training sample used in
poisoning-based backdoor attacks for embedding back-
door(s) in the model during the training process.
• Trigger is the pattern used for generating poisoned sam-
ples and activating the hidden backdoor(s).
• Attacked sample indicates the malicious testing sample
containing backdoor trigger(s).
• Attack scenario refers to the scenario that the backdoor
attack might happen. Usually, it happens when the train-
ing process is inaccessible or out of control by the user,
such as training with third-party datasets, training through
third-party platforms, or adopting third-party models.
• Source label indicates the ground-truth label of a poi-
soned or an attacked sample.
• Target label is the attacker-specified label. The attacker
intends to make all attacked samples to be predicted as
the target label by the infected model.
• Attack success rate (ASR) denotes the proportion of
attacked samples which are successfully predicted as the
target label by the infected model.
• Benign accuracy (BA) indicates the accuracy of benign
test samples predicted by the infected model.
• Attacker’s goal describe what the backdoor attacker in-
tends to do. In general, the attacker intends to design an
infected model that performs well on the benign testing
sample while achieving high attack success rate.
• Capacity defines what the attacker/defender can and
cannot do to achieve their goal.
• Attack/Defense approach illustrates the process of the
designed backdoor attack/defense.
model: Infected model 感染模型 Benign model良性模型
sample: Poisoned sample 中毒样本 Attacked sample 攻击样本
Trigger 触发器
label: Source label 源标签 Target label 目标标签
Attack scenario Judgment:Attack success rate (ASR) 攻击成功率
Benign accuracy (BA)感染预测准确性
Classical Scenarios and Corresponding Capacities
文章根据攻击者的能力,将攻击场景划分为了三类,如下图中所示
Scenario 1: Adopt Third-Party Datasets:攻击者只能操作数据集,而不能修改模型、训练计划和推理管道。
Scenario 2: Adopt Third-Party Platforms:攻击者能控制训练集和时间表,但不能改变模型结构,否则用户会注意到攻击。
Scenario 3: Adopt Third-Party Models:攻击者可以改变除了推断管道之外的一切。
由于从场景1到场景3,攻击者的能力增加,而防御者的能力下降。因此,针对前一种场景设计的攻击也可以应用于之后的场景中,防御方法则相反。
Backdoor Attacks
文章根据后门攻击的方式,将其划分为“POISONING-BASED BACKDOOR ATTACKS”(以中毒为手段的后门攻击)与“NON-POISONING-BASED BACKDOOR ATTACKS”(不以中毒为手段的后门攻击)
POISONING-BASED BACKDOOR ATTACKS
基于投毒的后门攻击是现行使用比较广泛的形式,根据不同的标准可以对其进行分类,如下图所示:
同时,文章介绍了一下后门攻击的三大风险(用了下网上的图)
NON-POISONING-BASED BACKDOOR ATTACKS
最近的文献也提出了一些不基于中毒的攻击,这些方法在训练过程中不直接基于数据中毒对DNN嵌入隐藏后门。目前比较主流的非毒后门攻击形式有两种:一种是改变模型权值,另外一种是改变模型结构。不同于投毒的后门攻击在数据收集与训练阶段进行后门攻击,此两者一般在模型部署阶段发生。
(1)Weights-oriented Backdoor Attacks:攻击者通过修改模型的参数,对模型权值施以不同扰动。它可以翻转存储在内存中的关键权重位,同时可以显著减少嵌入隐藏后门所需的翻转位。
(2)Structure-modified Backdoor Attacks:攻击者对原模型不进行处理,而是添加一个恶意模块(不同于后门,后门是嵌入到良性模型中的,而恶意模块与原模型是并列的),在一般情况下调用良性模型,在监测到恶意触发器时,不进入良性模型,直接进入恶意模块中进行处理。
Backdoor Defence
文章将现行后门防御手段分为Empirical Backdoor Defenses(经验后门防御)和Certified Backdoor Defenses(认证后门防御)。划分标准在于:经验防御是基于对现有攻击的一些理解提出的,在实践中表现良好,但其有效性没有理论保证;经过认证的后门防御的有效性在一定的假设下得到了理论上的保证,然而在实践中它通常比经验防御更弱。
在现行理论中,认证防御都是基于随机平滑的,但经验后门防御手段相对比较多样。
Empirical Backdoor Defenses
1) Preprocessing-based Defenses (基于预处理模块的防御措施):通过在样本输入DNN之前引入预处理模块,以改变受攻击样本中包含的触发模式。因此,修改后的触发器不再匹配隐藏的后门,因此防止后门激活。
2)Model Reconstruction(模型重构) based Defenses:不同于前者,此法通过直接修改可疑模型来去除感染模型中隐藏的后门。因此,即使触发器包含在被攻击的样本中,重构模型仍然会正确地预测它们,因为隐藏的后门已经被移除。
3)Trigger Synthesis based Defenses(基于触发器合成的防御):除了直接消除隐藏的后门外,基于触发合成的防御首先合成后门的触发器,然后通过抑制触发的效果消除隐藏的后门。这个方法与基于重构的防御有诸多相似之处,然而,与基于重构的防御相比,此法获取的触发信息的步骤使移除过程更加高效。
4)Model Diagnosis(模型诊断) based Defenses:此法基于预先训练的元分类器来判断可疑模型是否被感染,并拒绝部署被感染的模型。由于只使用良性模型进行部署,因此自然消除了隐藏的后门。
5) Poison Suppression(抑制毒性) based Defenses:此法在训练过程中利用训练过程的随机性,对中毒样本进行影响,从而抑制有毒样本的有效性,阻止了训练过程中后门的产生。
6)Training Sample Filtering(过滤样本) based Defenses:此法旨在从训练数据集中过滤有毒样本。过滤过程结束后,只有良性样本或净化的有毒样本将用于训练过程,这从源头上阻断了后门创建。
Certified Backdoor Defenses
现行经验防御虽说更为简单有效但总有特定的后门攻击能绕过对应防御手段。因此基于随机平滑理论提出了认证防御。
Future Directions
文章提出了五个潜在的后门攻击研究方向
A. Trigger Design
B. Semantic and Physical Backdoor Attacks
C. Attacks Towards Other Tasks
D. Effective and Efficient Defenses
E. Mechanism Exploration
最后引用一下原文的总结:
Backdoor learning, including backdoor attacks and back-door defenses, is a critical and booming research area. In this survey, we summarized and categorized existing back-door attacks and proposed a unified framework for analyzing poisoning-based backdoor attacks. We also discussed the relation between backdoor attacks and related research areas and analyzed existing defenses. Classical benchmark datasets and potential research directions were illustrated at the end.Note that almost all studies in this field were completed in the last four years and the cat-and-mouse game between attacks and defenses is likely to continue in the future. We hope that this survey could remind researchers of backdoor threats and provide a timely view. It would be an important step towards building more robust and safer deep learning methods.
边栏推荐
- Postgraduate Work Weekly (Week 13)
- 分类任务系列学习——总述
- 【深度学习】梳理范数与正则化(二)
- 浏览器指纹识别是什么意思?
- Different compilers, different modes, impact on results
- [Elementary C language] Detailed explanation of branch statements
- 记一次解决Mysql:Incorrect string value: ‘\xF0\x9F\x8D\x83\xF0\x9F...‘ for column 插入emoji表情报错问题
- 什么是链游?小白必看!A3
- 桥接模式下虚拟机连接不上网络的解决方法(WIFI)
- Example of file operations - downloading and merging streaming video files
猜你喜欢
浏览器中的302你真的知道吗
抱抱脸(hugging face)教程-中文翻译-模型概要
More than pytorch from zero to build neural network to realize classification (training data sets)
PAT1027 Printing Hourglass
"Deep learning" evaluation index of target detection
【深度学习】梳理范数与正则化(二)
AsyncTask 串行还是并行
【Postgraduate Work Weekly】(Week 9)
Candide3人脸动画模型
【研究生工作周报】(第七周)
随机推荐
At the beginning of the C language order 】 【 o least common multiple of three methods
小型项目如何使用异步任务管理器实现不同业务间的解耦
什么是链游?小白必看!A3
浏览器中的302你真的知道吗
【 Leetcode 】 433. The smallest genetic changes
encapsulation of strlen(), strcpy(), strncpy(), strcat(), strncat(), strcmp(), strncmp() functions
你知道亚马逊代运营的成本是多少吗?
Basic principles and common methods of digital image processing
记一次解决Mysql:Incorrect string value: ‘\xF0\x9F\x8D\x83\xF0\x9F...‘ for column 插入emoji表情报错问题
(13)Filter过滤器
MIUI12.1.5安装google App store,无需ROOT,不闪退
桥接模式下虚拟机连接不上网络的解决方法(WIFI)
【深度学习】SVM解决线性不可分情况(八)
scala 内部类使用小细节
研究生工作周报
【深度学习】介绍六大类损失函数(九)
pyspark explode时增加序号
抱抱脸(hugging face)教程-中文翻译-共享定制模型
封装仿支付宝密码输入效果
stream去重相同属性对象