当前位置:网站首页>Image Manipulation Detection by Multi-View Multi-Scale Supervision
Image Manipulation Detection by Multi-View Multi-Scale Supervision
2022-04-21 17:04:00 【Kun Li】
1.abstract
The key challenge of image tamper detection is how to learn the generalization features sensitive to new data tampering and prevent false alarms on real images , Current research emphasizes sensitivity, Neglected specificity, This passage multi-view feature learning and multi-scale supervision To solve , Through multi perspective feature learning and multi-scale supervision , This idea is very common in tamper detection , Previously, the in vivo detection of small vision technology also added a Fourier branch to supervise the refinement of features , Because tamper detection focuses on the problem of tampered image artifacts and edges , The core is how to distinguish the tampered areas on the original drawing , Therefore, the supervision of adding a strong branch is also reasonable . Multi view learning utilizes the noise distribution and boundary artifacts around the tampered area ,By exploiting noise distribution and boundary artifact surrounding tampered regions, To learn the characteristics of semantic unknowability , So as to obtain more general characteristics , The latter allows us to learn from real images , These images are not important for the current methods based on semantic segmentation network .
2.introduction
copy-move, Copy and move elements from one area of a given image to another area ,splicing, Copy and paste elements from one image onto another ,inpainting, Remove unwanted elements , These are three common types of image processing .
This task is considered as a simplified case of image semantic segmentation , But the semantic segmentation model is suboptimal , Because it is designed to capture semantic information , Make the network dependent on data sets , Instead of generalizing , That's very good , In fact, the scene design and definition of early tamper detection , Mostly artificial data , Define tamper types on artificial data , Often the trained models are strongly dependent on data , Very poor generalization .
In order to learn semantic unknowable features (semantic-agnostic features), Image content must be suppressed , That's important , Before, when doing the two classification tasks of tampering and non tampering , The classification model learns the feature factors other than tampering features , And these are what we don't need , To suppress the interference of these image contents . The current method is divided into two groups , That is, noise view method and edge monitoring method , hypothesis splicing and inpainting The new element introduced is different from the real part in terms of noise distribution , The first set of methods aims to take advantage of this difference , The noise map of the input image generated by a predefined high pass filter or a trainable counterpart is sent to the depth network alone or together with the input image , But this pair has no new elements to introduce copymove It's invalid , The second set of methods focuses on finding boundary artifacts as tampering tracks around the tampered area , Rebuild the edge of the area by adding auxiliary branches .
3.related work

The above figure is some recent work investigated by the author , The top note is similar to my previous research work. ,RBG、 Noise diagram, etc , I've also seen the use of Fourier and ela The graph is used to enhance the auxiliary branch feature .
This article focuses on copy-move/splicing/inpainting Three types of , For Gaussian blur and jpeg Compress this concern constrained cnn. Use BayerConv This constrained convolution layer is helpful to extract noise information , But using them alone will lead to the loss of the original rgb Risks of other useful information in the input . Double current fasterrcnn It's using srm filter ,mvssnet Noise map is used , And later merged rgb And noise diagram , And the fusion is not untrained bilinear pooling , It is dual attention.
Tampering with a given area in a given image will inevitably leave traces between the tampered area and its surrounding environment , Therefore, how to use this edge artifact is also very important for tamper detection .mvssnet There is an edge Supervision Branch .
4.proposed model
The classification idea intelligently determines whether the changed area has been tampered with , But there is no way to accurately trace the tampered area , However, the segmentation method can not only judge whether it is tampered with , And give the specific area of tampering , The segmentation method is really good , Before doing classification and detection, we actually need to think about similar problems more closely , Each pixel has a probability of binary classification , On top of this, there is a global segmentation graph ,mvssnet Accept rgb And noise diagram , There are three scales of labels to monitor , Pixels , Edges and images . On the issue of tamper detection, we should do it based on the idea of segmentation , I've been thinking about how to remove the interference of factors other than non tampering , For now , Classification methods will inevitably encounter this problem , Not thin enough , Classification of features that are not pixel level . However, it is also important to find commonalities after feature extraction in the region .
4.1 multi-view feature learning

resnet50 As the backbone ,edge-supervised The branch is specifically designed to take advantage of subtle boundary artifacts around the tampered area ,noise-sensitive The purpose of branching is to capture inconsistencies between tampered areas and real areas . Both branches have nothing to do with semantics .
4.1.1 edge-supervised branch
Ideally , Through edge supervision , We hope that the response area of the network will be more concentrated in the tampered area . Designing such an edge monitoring network is not easy . It's worth thinking about , Do you want to db So let the model pay more attention to these edge areas ? As the first 2 Section , The main challenge is how to build the appropriate input for the edge detection head . One side , Use the last ResNet The characteristics of blocks are problematic , Because this will force deep features to capture low-level edge patterns , Thus affecting the main task of operation segmentation . On the other hand , Using features from the initial block is also problematic , Because the subtle edge patterns contained in these shallow features can easily disappear after multiple depth convolutions . therefore , It is necessary to use both shallow and deep features . However , We believe that the simple feature connection used before is suboptimal , Because the features are mixed , And there is no guarantee that deeper features will be fully supervised by the edge head . To overcome the challenge , We propose to construct the input of the edge head from shallow to deep .
From different ResNet The features of the blocks are combined in a progressive manner for operating edge detection . To enhance edge related patterns , We introduced Sobel layer . The first i The features of a block first pass through Sobel layer , Then there is the edge residual block (ERB), Then combine them with their counterparts from the next block ( By summation ). To prevent cumulative effects , The combined features go through another... Before the next round of feature combination ERB. We believe that this mechanism helps to prevent extreme situations in which the marginal head oversaw or completely ignored the deep features . Through the visualization diagram 4 Last of ResNet Characteristic graph of block , We observed that the proposed ESB Indeed, there is a more concentrated response near the tampering area .


In the figure 2 in ,ESB There are two outputs , The first output goes through sigmoid function , It's an edge surveillance chart , The second is the main segmentation graph

4.1.2 noise-sensitive branch
In order to make full use of noise view , We built a model with edge-supervised branch Parallel noise sensitive branches ,nsb It's a standard fcn, Use resnet50, Noise extraction selects BayarConv, It is better than SRM Filters are better .
![]()
4.1.3 branch fusion by dual attention
Through trainable dual attention Modules to integrate esb and nsb Output characteristic diagram , Bilinear pooling is not used , Double current fasterrcnn It uses bilinear pooling , You don't have to train .
da There are two parallel branches , Blue is the channel , Green is the location , CA Associate channel features , To selectively emphasize the interdependent channel characteristic diagram . meanwhile ,PA The features of each location are selectively updated by the weighted sum of the features of all locations .CA and PA After integration , adopt 1x1 Convolution is converted to 1 Diagram of two channels , Image size unchanged , Then use the parameterless bilinear up sampling , Then the sigmoid, Turn to the final segmentation diagram .da There were two 2048 Sum of graphs of channels , Add and become 4096 passageway , after da attention become 1 passageway .

4.2 multi-scale supervision
Supervision of three scales , Pixel level loss , Edge loss and image level loss for learning semantic independent features ,
pixel-scale loss. Use dice loss, There are usually very few pixels in a given image , Learn from extremely unbalanced data , Learn from the original size .
edge loss. Use dice loss, It's an ancillary loss , Do not calculate on the size of the original drawing , stay 1/4 Calculate the loss under the dimensions in the figure , Reduces the computational cost of training , At the same time, it improves performance .
image-scale loss.bce loss
dice loss Be similar to iou-loss,bce Is the second classification of pixels , Image segmentation task ,softmax Cross entropy loss It is to predict the category of each pixel , Then average all pixels . In essence, it is still equal learning for each pixel of the picture , This leads to when there is an imbalance between multiple categories on the image , The training of the model will be dominated by the most mainstream categories . The Internet is more inclined to learn from mainstream categories , It reduces the ability of feature extraction for non mainstream categories ,bce If yes, the weight will be added to the positive and negative samples .dice loss Through prediction and GT The intersection of is calculated by dividing their overall pixels , Consider all pixels of a category as a whole , And calculate the proportion of intersection in the whole , Therefore, it will not be affected by a large number of mainstream pixels , Can extract better results . In the actual ,dice loss Often associated with bce loss Use a combination of , To improve the stability of model training .
This paper is a combination of these three losses :

Among them clf It's classified loss ,seg It's a split loss , Auxiliary edge loss edge map It's through cv2.findContours To obtain the
5.experiments
f1, Default threshold 0.5, The input image size is 512*512,imagenet Pre training initialization .
版权声明
本文为[Kun Li]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204211658112712.html
边栏推荐
- Tuojing technology landed on the science and Innovation Board: raised about 2.3 billion yuan, with a total market value of more than 10 billion yuan
- R language uses qbern function to generate Bernoulli distribution (0-1 distribution) quantile function data, and uses plot function to visualize Bernoulli distribution
- [an article helps you lay a good foundation for routing]
- 2-4. Port binding
- Do you need to set the socket to non blocking when using epoll?
- Design and practice of unified security authentication for microservice architecture
- Priority of keyword execution of MySQL query statement
- 一、数据库系列之数据库系统概述
- Extensive reading of alexnet papers: landmark papers in the field of deep learning CV neurips2012
- [microservice] microservice security - how to protect your microservice infrastructure?
猜你喜欢

前五章内容思维导图

Interpretation of a paper that points out the small errors in the classic RMS proof process

疫情催化下,毫末智行这款产品为何能推动自动配送商业化加速?

Cookie&Session学习
![[an article helps you lay a good foundation for routing]](/img/3d/1c0329c39c0c5282db841db970a060.png)
[an article helps you lay a good foundation for routing]

Priority of keyword execution of MySQL query statement

Alexnet论文泛读:深度学习CV领域划时代论文具有里程碑意义NeurIPS2012

Quick MTF,镜头图像质量测试应用程序

俄罗斯门户网站 Yandex 开源 YDB 数据库

解读论文记录 指出经典的RMS证明过程小错误的一个论文的解读
随机推荐
[newcode] cattle team competition
R language uses the select function in dplyr package to delete data columns in dataframe based on the index value of data columns
不同特征程序类反弹shell
Cookie&Session学习
高数 | 【多元函数微分学】如何判断二元微分式是否为全微分
MySQL: 1103 error solution
Win10 bridging network card enables QEMU virtual machine to access the network normally
多语言通信基础 04 grpc和protobuf
URL to download vscode offline plug-in package
The R language uses the plot function to visualize the data scatter diagram. The col parameter is set as factor variable and custom color list. The data points in different groups are displayed in dif
pytorch index_ add_ Usage introduction
2-4. Port binding
R语言使用grepl函数检查子字符串是否存在于指定的字符串中、字符串匹配,负责搜索给定字符串对象中是否包含特定表达式
wx-open-launch-weapp 样式问题
Using Jetson nano as an environmental weather station
反弹base存在一个fd的情况
Mongodb security configuration
用Jetson Nano 做一个环境气象站
mysql创建数据库sql语句
R语言使用plot函数可视化数据散点图,自定义设置col参数为因子变量、自定义颜色列表、不同分组的数据点使用不同的颜色显示(color by group)