当前位置:网站首页>FEMRL: A Framework for Large-Scale Privacy-Preserving Linkage of Patients’ Electronic Health Rec Paper Summary
FEMRL: A Framework for Large-Scale Privacy-Preserving Linkage of Patients’ Electronic Health Rec Paper Summary
2022-08-10 19:27:00 【Tongqing Ice Butterfly Kiyotaka】
FEMRL: A Framework for Large-Scale Privacy-Preserving Linkage of Patients’ Electronic Health Records论文总结
Abstract
Consolidate large volumes of patient healthcare data from disparate data sources,to facilitate data analysis and cleaning tasks.
LSHDB,It is a parallel and distributed data engine,Used to enforce privacy-preserving record links (PPRL) 任务,It also provides a formal guarantee of the integrity of the results.
I. INTRODUCTION
Recording a link consists of two steps:Block and match
in the blocking step,The record linking algorithm aims to formulate as many matching record pairs as possible from the large number of records participating in a typical record linking setup.
in the matching step,The algorithm is designed to classify the pairs formulated in the previous step as matching or not matching.
Link to Privacy Shield Records(PPRL)Technology can be used to achieve high link quality with privacy guarantees.
第一步,Healthcare providers mask the electronic patient records they collect,to protect some(常见)直接标识符,Such as patient name and home address,These identifiers are useful for enabling record linking [30].
Other direct identifiers,For example, the patient's medical record number,Because they are both sensitive and right PPRL 无用(Because it is not universal)And hide from the data.最后,Selected indirect identifier,such as symptoms or medication,Leave unblocked,to facilitate data analysis based on these dimensions.
Processed data is securely transmitted to TTP,and stored in a secure environment(Follow legal requirements).
II. RELATED WORK
缺乏:执行并行计算、Work with distributed data stores or build efficient index structures to query data online.
III. PRELIMINARIES
A. Data Masking Methods
采用 Schnell Based on the presentation by et al Bloom The encoding method of the filter,其中每个 Bloom A filter represents a complete data record.
frequency attack
B. Locality-Sensitive Hashing
Locality-Sensitive Hashing (LSH) technique
Random locality-sensitive hashing (LSH) 技术
LSH Guaranteed to use a strictly defined number of hash tables [18] Each similar pair of records is identified with high probability.The similarity between a pair of records is defined by specifying an appropriate distance threshold in the metric space used.
C. Overview of LSHDB
分布式引擎,它利用 LSH and the power of parallelism to perform record linking and similarity search tasks.
Hashing records and keeping them ready to link saves time
在创建数据存储时,The developer only needs to specify two parameters:
(i) 将采用的 LSH 方法,例如 Hamming、Min-Hash 或 Euclidean LSH
(ii) 底层 noSQL The data engine will be used to host the data.
Across distributed data stores
IV. FEMRL: A FRAMEWORK FOR ELECTRONIC MEDICAL RECORD LINKAGE
A. Blocking and Matching of Records
A very important configuration parameter is to define the threshold that will be used during distance calculation,Because the threshold will specify the number of hash tables that will be created.
1) The Monolithic Mode单体模式:
in singleton mode,Data custodians block their records and send them to TTP (Trusted Third Party ).
反过来,TTP Provide the masked record of the submitted dataset to LSHDB,to build the necessary hash table
Comparing established pairs of records belonging to different healthcare providers against specified distance thresholds,to detect those records corresponding to the same patient.
优点:简单
缺点:Overwhelm a single site、Scalability can only be achieved through expensive software and hardware upgrades
2) The Distributed Mode分布模式:
TTP Maintain multiple sites in a secure environment,Each site holds a horizontal partition of masked records previously submitted by the data custodian.
Blocked records are submitted to a central site,Then forward to the rest of the site.
优点:(a) No mass distribution and maintenance records at a single site、 (b) FEMRL 可以轻松扩展
缺点:带有 noSQL 系统的 LSHDB Must be installed in every site
3) Algorithms Used by Both Modes:(没有理解)
Complexity
算法 1 running time and A The number of records is linear
算法 2 的总运行时间为
B. Integration with MapReduce
FEMRL 在 MapReduce run on top of the infrastructure.
Map Phases are blocking steps,而 Reduce Stages are matching steps
- Map phase.
map阶段.Each map task builds a hash key for each masked record at hand,and record them accordingly ID Send to partition tasks together. - Distribution of tuples.
Distribution of tuples.Each partition task,Always bound to a map task,Controls the distribution of formulated tuples to reduction tasks.Tuples with the same hash key will be forwarded to a specific one reduce 任务. - Reduce phase.
每个reduceThe task handles the load of received tuples forwarded by the task.
首先,Map The task hashes the masked records,随后 Reduce The task inserts the aggregated hash result into the appropriate one LSHDB 实例中.
V. EXPERIMENTAL EVALUATION
实验评价
A. Data Sets and Metrics
数据集和指标
Two indicators are used:
(a) Pairing integrity(PC 或召回率),That is, the ratio of the number of true positives returned to the total number of true positives,
(b) pairing quality(PQ 或精度),That is, the ratio of the number of true positives returned to the total number of true and false positives processed.
VI. CONCLUSIONS
FEMRL,A privacy-preserving framework for logging links.
FEMRL 的核心组件是 LSHDB,This is a parallel distributed data engine
LSHDB 与 MapReduce The integration led to the construction of a distributed data store,Used to perform on-demand PPRL 任务.
边栏推荐
- 2022杭电多校七 Black Magic (签到)
- 多种深度模型实现手写字母MNIST的识别(CNN,RNN,DNN,逻辑回归,CRNN,LSTM/Bi-LSTM,GRU/Bi-GRU)
- 003-序列图(一)
- 什么是企业知识库?有什么作用?如何搭建?
- 端口探测详解
- 位算符详解 按位与、或、异或、取反、左移、右移
- 如何通过JMobile软件实现虹科物联网HMI/网关的报警功能?
- 从企业的视角来看,数据中台到底意味着什么?
- JVM内存和垃圾回收-11.执行引擎
- [Teach you how to do mini-games] How to lay out the hands of Dou Dizhu?See what the UP master of the 250,000 fan game area has to say
猜你喜欢
We used 48h to co-create a web game: Dice Crush, to participate in international competitions
常见端口及服务
【知识分享】在音视频开发领域中SEI到底是个啥?
[教你做小游戏] 斗地主的手牌,如何布局?看25万粉游戏区UP主怎么说
[Teach you how to do mini-games] How to lay out the hands of Dou Dizhu?See what the UP master of the 250,000 fan game area has to say
【无标题】基于Huffman和LZ77的GZIP压缩
钻石价格预测的ML全流程!从模型构建调优道部署应用!
postgis空间数据导入及可视化
Keil5退出仿真调试卡死的解决办法
【自然语言处理】【向量表示】PairSupCon:用于句子表示的成对监督对比学习
随机推荐
瑞吉外卖学习笔记4
Consul Introduction and Installation
[Teach you how to do mini-games] How to lay out the hands of Dou Dizhu?See what the UP master of the 250,000 fan game area has to say
入门:人脸专集2 | 人脸关键点检测汇总(文末有相关文章链接)
dumpsys meminfo 详解
【知识分享】在音视频开发领域中SEI到底是个啥?
mysql 中大小写问题
websocket校验token:使用threadlocal存放和获取当前登录用户
Redis命令---key篇 (超全)
如何通过JMobile软件实现虹科物联网HMI/网关的报警功能?
[Image dehazing] Image dehazing based on color attenuation prior with matlab code
宝塔部署flask项目
NPDP|传统行业产品经理如何进行能力提升?
C#/VB.NET 将PDF转为PDF/X-1a:2001
pip3升级到22.2.2
CEO对今天的CIO们真正的要求是什么?
补坑简单图论题
Biotin-PEG4-IC(TFP ester/amine/NHS Ester/azide)特性分享
开源一夏 | mysql5.7 安装部署 -二进制安装
TikTok选品有什么技巧?