当前位置:网站首页>FEMRL: A Framework for Large-Scale Privacy-Preserving Linkage of Patients’ Electronic Health Rec Paper Summary
FEMRL: A Framework for Large-Scale Privacy-Preserving Linkage of Patients’ Electronic Health Rec Paper Summary
2022-08-10 19:27:00 【Tongqing Ice Butterfly Kiyotaka】
FEMRL: A Framework for Large-Scale Privacy-Preserving Linkage of Patients’ Electronic Health Records论文总结
Abstract
Consolidate large volumes of patient healthcare data from disparate data sources,to facilitate data analysis and cleaning tasks.
LSHDB,It is a parallel and distributed data engine,Used to enforce privacy-preserving record links (PPRL) 任务,It also provides a formal guarantee of the integrity of the results.
I. INTRODUCTION
Recording a link consists of two steps:Block and match
in the blocking step,The record linking algorithm aims to formulate as many matching record pairs as possible from the large number of records participating in a typical record linking setup.
in the matching step,The algorithm is designed to classify the pairs formulated in the previous step as matching or not matching.
Link to Privacy Shield Records(PPRL)Technology can be used to achieve high link quality with privacy guarantees.
第一步,Healthcare providers mask the electronic patient records they collect,to protect some(常见)直接标识符,Such as patient name and home address,These identifiers are useful for enabling record linking [30].
Other direct identifiers,For example, the patient's medical record number,Because they are both sensitive and right PPRL 无用(Because it is not universal)And hide from the data.最后,Selected indirect identifier,such as symptoms or medication,Leave unblocked,to facilitate data analysis based on these dimensions.
Processed data is securely transmitted to TTP,and stored in a secure environment(Follow legal requirements).
II. RELATED WORK
缺乏:执行并行计算、Work with distributed data stores or build efficient index structures to query data online.
III. PRELIMINARIES
A. Data Masking Methods
采用 Schnell Based on the presentation by et al Bloom The encoding method of the filter,其中每个 Bloom A filter represents a complete data record.
frequency attack
B. Locality-Sensitive Hashing
Locality-Sensitive Hashing (LSH) technique
Random locality-sensitive hashing (LSH) 技术
LSH Guaranteed to use a strictly defined number of hash tables [18] Each similar pair of records is identified with high probability.The similarity between a pair of records is defined by specifying an appropriate distance threshold in the metric space used.
C. Overview of LSHDB
分布式引擎,它利用 LSH and the power of parallelism to perform record linking and similarity search tasks.
Hashing records and keeping them ready to link saves time
在创建数据存储时,The developer only needs to specify two parameters:
(i) 将采用的 LSH 方法,例如 Hamming、Min-Hash 或 Euclidean LSH
(ii) 底层 noSQL The data engine will be used to host the data.
Across distributed data stores
IV. FEMRL: A FRAMEWORK FOR ELECTRONIC MEDICAL RECORD LINKAGE
A. Blocking and Matching of Records
A very important configuration parameter is to define the threshold that will be used during distance calculation,Because the threshold will specify the number of hash tables that will be created.
1) The Monolithic Mode单体模式:
in singleton mode,Data custodians block their records and send them to TTP (Trusted Third Party ).
反过来,TTP Provide the masked record of the submitted dataset to LSHDB,to build the necessary hash table
Comparing established pairs of records belonging to different healthcare providers against specified distance thresholds,to detect those records corresponding to the same patient.
优点:简单
缺点:Overwhelm a single site、Scalability can only be achieved through expensive software and hardware upgrades
2) The Distributed Mode分布模式:
TTP Maintain multiple sites in a secure environment,Each site holds a horizontal partition of masked records previously submitted by the data custodian.
Blocked records are submitted to a central site,Then forward to the rest of the site.
优点:(a) No mass distribution and maintenance records at a single site、 (b) FEMRL 可以轻松扩展
缺点:带有 noSQL 系统的 LSHDB Must be installed in every site
3) Algorithms Used by Both Modes:(没有理解)
Complexity
算法 1 running time and A The number of records is linear
算法 2 的总运行时间为
B. Integration with MapReduce
FEMRL 在 MapReduce run on top of the infrastructure.
Map Phases are blocking steps,而 Reduce Stages are matching steps
- Map phase.
map阶段.Each map task builds a hash key for each masked record at hand,and record them accordingly ID Send to partition tasks together. - Distribution of tuples.
Distribution of tuples.Each partition task,Always bound to a map task,Controls the distribution of formulated tuples to reduction tasks.Tuples with the same hash key will be forwarded to a specific one reduce 任务. - Reduce phase.
每个reduceThe task handles the load of received tuples forwarded by the task.
首先,Map The task hashes the masked records,随后 Reduce The task inserts the aggregated hash result into the appropriate one LSHDB 实例中.
V. EXPERIMENTAL EVALUATION
实验评价
A. Data Sets and Metrics
数据集和指标
Two indicators are used:
(a) Pairing integrity(PC 或召回率),That is, the ratio of the number of true positives returned to the total number of true positives,
(b) pairing quality(PQ 或精度),That is, the ratio of the number of true positives returned to the total number of true and false positives processed.
VI. CONCLUSIONS
FEMRL,A privacy-preserving framework for logging links.
FEMRL 的核心组件是 LSHDB,This is a parallel distributed data engine
LSHDB 与 MapReduce The integration led to the construction of a distributed data store,Used to perform on-demand PPRL 任务.
边栏推荐
- The servlet mapping path matching resolution
- 工业基础类—利用xBIM提取IFC几何数据
- FPGA工程师面试试题集锦81~90
- Keras deep learning combat (17) - image segmentation using U-Net architecture
- 【无标题】基于Huffman和LZ77的GZIP压缩
- 常见端口及服务
- MySql主要性能指标说明
- 端口探测详解
- redis.exceptions.DataError: Invalid input of type: ‘dict‘. Convert to a byte, string or number first
- 2816. 判断子序列(双指针)
猜你喜欢
MySQL 查询出重复出现两次以上的数据 - having
DefaultSelectStrategy NIOEventLoop执行策略
We used 48h to co-create a web game: Dice Crush, to participate in international competitions
宝塔部署flask项目
入门:人脸专集2 | 人脸关键点检测汇总(文末有相关文章链接)
AIRIOT答疑第8期|AIRIOT的金字塔服务体系是如何搞定客户的?
多种深度模型实现手写字母MNIST的识别(CNN,RNN,DNN,逻辑回归,CRNN,LSTM/Bi-LSTM,GRU/Bi-GRU)
工业基础类—利用xBIM提取IFC几何数据
我们用48h,合作创造了一款Web游戏:Dice Crush,参加国际赛事
[教你做小游戏] 斗地主的手牌,如何布局?看25万粉游戏区UP主怎么说
随机推荐
The Biotin-PEG3-Br/acid/NHS ester/alcohol/amine collection that everyone wants to share
谈谈宝石方块游戏中的设计
FPGA工程师面试试题集锦61~70
如何通过JMobile软件实现虹科物联网HMI/网关的报警功能?
FPGA工程师面试试题集锦91~100
多种深度模型实现手写字母MNIST的识别(CNN,RNN,DNN,逻辑回归,CRNN,LSTM/Bi-LSTM,GRU/Bi-GRU)
剑指 Offer 27. 二叉树的镜像(翻转二叉树)
友邦人寿可观测体系设计与落地
常见端口及服务
网站架构探测&chrome插件用于信息收集
CAS:190598-55-1_Biotin sulfo-N-hydroxysuccinimide ester生物素化试
FPGA:生成固化文件(将代码固化到板子上面)
[Go WebSocket] Your first Go WebSocket server: echo server
基于 RocksDB 实现高可靠、低时延的 MQTT 数据持久化
【深度学习前沿应用】图像风格迁移
端口探测详解
让mixin为项目开发助力【及递归优化新尝试】
pytorch使用Dataloader加载自己的数据集train_X和train_Y
JVM内存和垃圾回收-11.执行引擎
Introduction to 3 d games beginners essential 】 【 modeling knowledge