当前位置:网站首页>FEMRL: A Framework for Large-Scale Privacy-Preserving Linkage of Patients’ Electronic Health Rec Paper Summary
FEMRL: A Framework for Large-Scale Privacy-Preserving Linkage of Patients’ Electronic Health Rec Paper Summary
2022-08-10 19:27:00 【Tongqing Ice Butterfly Kiyotaka】
FEMRL: A Framework for Large-Scale Privacy-Preserving Linkage of Patients’ Electronic Health Records论文总结
Abstract
Consolidate large volumes of patient healthcare data from disparate data sources,to facilitate data analysis and cleaning tasks.
LSHDB,It is a parallel and distributed data engine,Used to enforce privacy-preserving record links (PPRL) 任务,It also provides a formal guarantee of the integrity of the results.
I. INTRODUCTION
Recording a link consists of two steps:Block and match
in the blocking step,The record linking algorithm aims to formulate as many matching record pairs as possible from the large number of records participating in a typical record linking setup.
in the matching step,The algorithm is designed to classify the pairs formulated in the previous step as matching or not matching.
Link to Privacy Shield Records(PPRL)Technology can be used to achieve high link quality with privacy guarantees.
第一步,Healthcare providers mask the electronic patient records they collect,to protect some(常见)直接标识符,Such as patient name and home address,These identifiers are useful for enabling record linking [30].
Other direct identifiers,For example, the patient's medical record number,Because they are both sensitive and right PPRL 无用(Because it is not universal)And hide from the data.最后,Selected indirect identifier,such as symptoms or medication,Leave unblocked,to facilitate data analysis based on these dimensions.
Processed data is securely transmitted to TTP,and stored in a secure environment(Follow legal requirements).
II. RELATED WORK
缺乏:执行并行计算、Work with distributed data stores or build efficient index structures to query data online.
III. PRELIMINARIES
A. Data Masking Methods
采用 Schnell Based on the presentation by et al Bloom The encoding method of the filter,其中每个 Bloom A filter represents a complete data record.
frequency attack
B. Locality-Sensitive Hashing
Locality-Sensitive Hashing (LSH) technique
Random locality-sensitive hashing (LSH) 技术
LSH Guaranteed to use a strictly defined number of hash tables [18] Each similar pair of records is identified with high probability.The similarity between a pair of records is defined by specifying an appropriate distance threshold in the metric space used.
C. Overview of LSHDB
分布式引擎,它利用 LSH and the power of parallelism to perform record linking and similarity search tasks.
Hashing records and keeping them ready to link saves time
在创建数据存储时,The developer only needs to specify two parameters:
(i) 将采用的 LSH 方法,例如 Hamming、Min-Hash 或 Euclidean LSH
(ii) 底层 noSQL The data engine will be used to host the data.
Across distributed data stores
IV. FEMRL: A FRAMEWORK FOR ELECTRONIC MEDICAL RECORD LINKAGE
A. Blocking and Matching of Records
A very important configuration parameter is to define the threshold that will be used during distance calculation,Because the threshold will specify the number of hash tables that will be created.
1) The Monolithic Mode单体模式:
in singleton mode,Data custodians block their records and send them to TTP (Trusted Third Party ).
反过来,TTP Provide the masked record of the submitted dataset to LSHDB,to build the necessary hash table
Comparing established pairs of records belonging to different healthcare providers against specified distance thresholds,to detect those records corresponding to the same patient.
优点:简单
缺点:Overwhelm a single site、Scalability can only be achieved through expensive software and hardware upgrades
2) The Distributed Mode分布模式:
TTP Maintain multiple sites in a secure environment,Each site holds a horizontal partition of masked records previously submitted by the data custodian.
Blocked records are submitted to a central site,Then forward to the rest of the site.
优点:(a) No mass distribution and maintenance records at a single site、 (b) FEMRL 可以轻松扩展
缺点:带有 noSQL 系统的 LSHDB Must be installed in every site
3) Algorithms Used by Both Modes:(没有理解)
Complexity
算法 1 running time and A The number of records is linear
算法 2 的总运行时间为
B. Integration with MapReduce
FEMRL 在 MapReduce run on top of the infrastructure.
Map Phases are blocking steps,而 Reduce Stages are matching steps
- Map phase.
map阶段.Each map task builds a hash key for each masked record at hand,and record them accordingly ID Send to partition tasks together. - Distribution of tuples.
Distribution of tuples.Each partition task,Always bound to a map task,Controls the distribution of formulated tuples to reduction tasks.Tuples with the same hash key will be forwarded to a specific one reduce 任务. - Reduce phase.
每个reduceThe task handles the load of received tuples forwarded by the task.
首先,Map The task hashes the masked records,随后 Reduce The task inserts the aggregated hash result into the appropriate one LSHDB 实例中.
V. EXPERIMENTAL EVALUATION
实验评价
A. Data Sets and Metrics
数据集和指标
Two indicators are used:
(a) Pairing integrity(PC 或召回率),That is, the ratio of the number of true positives returned to the total number of true positives,
(b) pairing quality(PQ 或精度),That is, the ratio of the number of true positives returned to the total number of true and false positives processed.
VI. CONCLUSIONS
FEMRL,A privacy-preserving framework for logging links.
FEMRL 的核心组件是 LSHDB,This is a parallel distributed data engine
LSHDB 与 MapReduce The integration led to the construction of a distributed data store,Used to perform on-demand PPRL 任务.
边栏推荐
猜你喜欢
MSE 治理中心重磅升级-流量治理、数据库治理、同 AZ 优先
MySQL 原理与优化:Update 优化
[Go WebSocket] 你的第一个Go WebSocket服务: echo server
set和map使用讲解
3D游戏建模学习路线
常量
【OpenCV】-物体的凸包
搭建自己的以图搜图系统 (一):10 行代码搞定以图搜图
[Image dehazing] Image dehazing based on color attenuation prior with matlab code
[Teach you how to do mini-games] How to lay out the hands of Dou Dizhu?See what the UP master of the 250,000 fan game area has to say
随机推荐
2022杭电多校七 Black Magic (签到)
flask生成路由的2种方式和反向生成url
3D Game Modeling Learning Route
[Go WebSocket] Your first Go WebSocket server: echo server
工业基础类—利用xBIM提取IFC几何数据
MySQL 查询出重复出现两次以上的数据 - having
Three schemes of SQL query across the table
基于 RocksDB 实现高可靠、低时延的 MQTT 数据持久化
服务器上行带宽和下行带宽指的是什么
365天挑战LeetCode1000题——Day 053 求解方程 解析 模拟
新建离线同步节点时选择数据去向-表时报错,数据库类型是adb pg,怎么办?
Consul简介和安装
[Teach you how to do mini-games] How to lay out the hands of Dou Dizhu?See what the UP master of the 250,000 fan game area has to say
杭电多校七 1003-Counting Stickmen(组合数学)
第15章_锁
如何通过JMobile软件实现虹科物联网HMI/网关的报警功能?
网络拓扑管理
瑞吉外卖学习笔记4
子域名收集&Google搜索引擎语法
【自然语言处理】【向量表示】PairSupCon:用于句子表示的成对监督对比学习