当前位置:网站首页>Data Governance (3): Data Quality Management
Data Governance (3): Data Quality Management
2022-08-08 08:03:00 【InfoQ】
< h2> data quality management < / h2>< br>< img SRC="/ / img.inotgo.com/imagesLocal/202208/08/202208080745170253_0.jpg" Alt=null loading=lazy>< br>< br>< h3> a, description of data quality < / h3>< div> in the early days of big data, do the primary purpose of data management is to improve data quality, to make more accurate reporting, analysis, application.Today, although the category of data governance expanded a lot, we begin to speak data assets management concept, knowledge map, automation of data management, and so on, but improve the quality of the data, is still one of the most important goal of data management.Because the data need to be able to play its value, the key lies in the quality of the data of high and low, the high quality of the data is the foundation of all data applications.Under the environment of the data quality is not high, do the data analysis is fraught with problems, data quality problems have seriously affected the normal operation of the organization's business.Through the scientific data quality management, continuously improving the quality of data, has become the priority of the urgent in the organization.< / div>< br>< h3> 2, data quality is the root cause < / h3>< div> do data quality management must first understand the causes of quality problems, there are many reasons, such as: technology, management, processing and business logic errors will encounter, but fundamentally cause most of the data quality problem in our business.< / div>< div> to solve the quality problem of the data is not simply by a tool can be done, need to fundamentally to recognize the real cause data quality issues, thus to solve the problem of data quality from the business.From a business perspective to solve the problem of data quality, it is important to establish a set of scientific and feasible data quality evaluation standards and management process.< / div>< br>< h3> 3, the principle of data quality assurance < / h3>< div> assessment data quality, industry standard is not unified.Alibaba to evaluate data warehouse is mainly from four aspects, namely the completeness, accuracy, consistency, and timeliness.< / div>< h4> 1, integrity < / h4>< div> integrity refers to the record of the data and information is complete, if there are missing.The lack of data mainly includes the lack of record and the lack of a field information in the record, will both cause inaccurate data, so the integrity is the basis of data quality assurance.< / div>< div> every payment orders such as trading at around 1 million, if one day to pay orders suddenly dropped to 10000, the record is likely to be missing.For the lack of a field information in the record, such as the order of the product ID, vendor ID is must exist, these fields empty value number must be zero, inevitably has violated the integrity constraints once they are greater than zero.< / div>< br>< h4> 2, accuracy < / h4>< div> accuracy is index according to the records in the information and data are accurate, whether there is abnormal or wrong information.For example, negative scores in the transcript and/or in the wrong order buyers information or negative order amount, etc., these data are all problems.Ensure the accuracy of records is also an indispensable part of guarantee the quality of data.< / div>< br>< h4> 3, consistency < / h4>< div> consistency usually embodied in span large data warehouses.For example, a company has a lot of business warehouse branch, for the same data, in different branch number storehouse must ensure consistency (article number warehouse after the ETL data of each layer, the number, the data values, type need consistent with the upper.From online business library, for example, processing into a data warehouse, and then to each node data application, the user ID must be kept in the same type, and length of stay consistent.< / div>< br>< h4> 4, timeliness < / h4>< div> security data output in a timely manner to reflect the value of the data.Decision-making analysts often hope that day, for example, you can see the previous day's data.If the waiting time is too long, lost the value of the timeliness of data, data analysis work will lose its significance.Offline for several positions are generally run the task in the morning here, timeliness can be guaranteed.< / div>< br>< h3> four, mind map < / h3>< br>< img SRC="/ / img.inotgo.com/imagesLocal/202208/08/202208080745170253_1.png" Alt=null loading=lazy>< br>
边栏推荐
- 优先队列的实现原理
- jupyter notebook处理文件导致IOPub data rate exceeded
- 生成密码字典的方法
- 炽热如初 向新而生|ISC2022 HackingClub白帽峰会圆满举办
- Gstreamer调试方式
- Task01 文件处理与邮件自动化
- 音视频入门知识-- --相关名词、术语、概念
- 图数据科学和机器学习图数据科学GDS概览
- [Optimized scheduling] Based on particle swarm to realize economic scheduling optimization of microgrid under grid-connected model with matlab code
- 论文解读:《Amy pred-FRL是一种通过使用特征表示学习来精确预测淀粉样蛋白的新方法》
猜你喜欢
随机推荐
[Regression prediction] Gaussian process regression based on GPML toolbox with matlab code
ACWing 198. 反素数 题解
ES2020(ES11)新特性
DAY1-深度学习100例-卷积神经网络(CNN)实现mnist手写数字识别
PhpStudy 2016搭建-DVWA靶场
DAY5-深度学习100例-卷积神经网络(CNN)天气识别
Adapt-Kcr:基于学习嵌入特征和注意力架构的新型深度学习框架,用于精确预测赖氨酸(crotonylation)位点
音视频入门知识-- --相关名词、术语、概念
php生成二维码并下载图片(适应于框架)
不一样的“能ping通不能上网”解决方法
shell循环语句
论文解读:《Amy pred-FRL是一种通过使用特征表示学习来精确预测淀粉样蛋白的新方法》
数据智能正当时,九章云极DataCanvas公司荣获“最具投资价值公司”
关于 QtCreator使用msvc2017x64编译器编译项目报错”编译器的堆空间不足“错误 的解决方法
seata什么时候支持sqlserver xa呀?
DCNN-4mC: Densely connected neural network basedN4-methylcytosine site prediction in multiple speci
数控机床工作平台位置伺服系统的的数学建模与仿真
Gatsby精粹,面向未来的blog
ES8 | async和await
Matlab实现异构交通流









