当前位置:网站首页>FEMRL: A Framework for Large-Scale Privacy-Preserving Linkage of Patients’ Electronic Health Rec Paper Summary

FEMRL: A Framework for Large-Scale Privacy-Preserving Linkage of Patients’ Electronic Health Rec Paper Summary

2022-08-10 19:27:00 Tongqing Ice Butterfly Kiyotaka


Abstract

Consolidate large volumes of patient healthcare data from disparate data sources,to facilitate data analysis and cleaning tasks.
LSHDB,It is a parallel and distributed data engine,Used to enforce privacy-preserving record links (PPRL) 任务,It also provides a formal guarantee of the integrity of the results.


I. INTRODUCTION

在这里插入图片描述
Recording a link consists of two steps:Block and match
in the blocking step,The record linking algorithm aims to formulate as many matching record pairs as possible from the large number of records participating in a typical record linking setup.
in the matching step,The algorithm is designed to classify the pairs formulated in the previous step as matching or not matching.
在这里插入图片描述

Link to Privacy Shield Records(PPRL)Technology can be used to achieve high link quality with privacy guarantees.
在这里插入图片描述
第一步,Healthcare providers mask the electronic patient records they collect,to protect some(常见)直接标识符,Such as patient name and home address,These identifiers are useful for enabling record linking [30].
Other direct identifiers,For example, the patient's medical record number,Because they are both sensitive and right PPRL 无用(Because it is not universal)And hide from the data.最后,Selected indirect identifier,such as symptoms or medication,Leave unblocked,to facilitate data analysis based on these dimensions.
Processed data is securely transmitted to TTP,and stored in a secure environment(Follow legal requirements).

在这里插入图片描述

II. RELATED WORK

在这里插入图片描述
缺乏:执行并行计算、Work with distributed data stores or build efficient index structures to query data online.

III. PRELIMINARIES

A. Data Masking Methods

采用 Schnell Based on the presentation by et al Bloom The encoding method of the filter,其中每个 Bloom A filter represents a complete data record.
frequency attack

B. Locality-Sensitive Hashing

Locality-Sensitive Hashing (LSH) technique
Random locality-sensitive hashing (LSH) 技术
LSH Guaranteed to use a strictly defined number of hash tables [18] Each similar pair of records is identified with high probability.The similarity between a pair of records is defined by specifying an appropriate distance threshold in the metric space used.

C. Overview of LSHDB

分布式引擎,它利用 LSH and the power of parallelism to perform record linking and similarity search tasks.
在这里插入图片描述

Hashing records and keeping them ready to link saves time
在这里插入图片描述

在创建数据存储时,The developer only needs to specify two parameters:
(i) 将采用的 LSH 方法,例如 Hamming、Min-Hash 或 Euclidean LSH
(ii) 底层 noSQL The data engine will be used to host the data.
在这里插入图片描述

Across distributed data stores
在这里插入图片描述

IV. FEMRL: A FRAMEWORK FOR ELECTRONIC MEDICAL RECORD LINKAGE

A. Blocking and Matching of Records

A very important configuration parameter is to define the threshold that will be used during distance calculation,Because the threshold will specify the number of hash tables that will be created.
在这里插入图片描述

1) The Monolithic Mode单体模式:

in singleton mode,Data custodians block their records and send them to TTP (Trusted Third Party ).
反过来,TTP Provide the masked record of the submitted dataset to LSHDB,to build the necessary hash table
Comparing established pairs of records belonging to different healthcare providers against specified distance thresholds,to detect those records corresponding to the same patient.
在这里插入图片描述
优点:简单
缺点:Overwhelm a single site、Scalability can only be achieved through expensive software and hardware upgrades

2) The Distributed Mode分布模式:

TTP Maintain multiple sites in a secure environment,Each site holds a horizontal partition of masked records previously submitted by the data custodian.
Blocked records are submitted to a central site,Then forward to the rest of the site.
在这里插入图片描述
优点:(a) No mass distribution and maintenance records at a single site、 (b) FEMRL 可以轻松扩展
缺点:带有 noSQL 系统的 LSHDB Must be installed in every site

3) Algorithms Used by Both Modes:(没有理解)

在这里插入图片描述
在这里插入图片描述
Complexity
算法 1 running time and A The number of records is linear在这里插入图片描述

算法 2 的总运行时间为在这里插入图片描述

B. Integration with MapReduce

FEMRL 在 MapReduce run on top of the infrastructure.
Map Phases are blocking steps,而 Reduce Stages are matching steps

  • Map phase.
    map阶段.Each map task builds a hash key for each masked record at hand,and record them accordingly ID Send to partition tasks together.
  • Distribution of tuples.
    Distribution of tuples.Each partition task,Always bound to a map task,Controls the distribution of formulated tuples to reduction tasks.Tuples with the same hash key will be forwarded to a specific one reduce 任务.
  • Reduce phase.
    每个reduceThe task handles the load of received tuples forwarded by the task.
    在这里插入图片描述
    首先,Map The task hashes the masked records,随后 Reduce The task inserts the aggregated hash result into the appropriate one LSHDB 实例中.

V. EXPERIMENTAL EVALUATION

实验评价

A. Data Sets and Metrics

数据集和指标
Two indicators are used:
(a) Pairing integrity(PC 或召回率),That is, the ratio of the number of true positives returned to the total number of true positives,
(b) pairing quality(PQ 或精度),That is, the ratio of the number of true positives returned to the total number of true and false positives processed.
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

VI. CONCLUSIONS

FEMRL,A privacy-preserving framework for logging links.
FEMRL 的核心组件是 LSHDB,This is a parallel distributed data engine
LSHDB 与 MapReduce The integration led to the construction of a distributed data store,Used to perform on-demand PPRL 任务.

原网站

版权声明
本文为[Tongqing Ice Butterfly Kiyotaka]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/222/202208101840000029.html