当前位置:网站首页>Data mining -- understanding data
Data mining -- understanding data
2022-04-23 05:32:00 【Muxi dare】
《 data mining 》 National Defense University of science and technology
《 data mining 》 Qingdao University
《 Data mining and python practice 》
Understanding of data mining
1. Data and information
** data (data)** Is the result of fact or observation , It is a logical induction of objective things , It is used to express objective things Unprocessed raw material .
In computer system , Various letters 、 Combination of numbers and symbols 、 voice 、 graphics 、 Images and the like are collectively referred to as data , After data processing become Information .
2. Data object and attribute type
Data set from Data objects form , A data object corresponds to an entity , Data objects can also be Tuples .
Data fields used to represent the characteristics or functions of data objects are called attribute .
Attribute types :
- Nominal properties nominal: States can be enumerated
special : Binary properties ,(0,1)
· Symmetric binary : The quantity is equal to
· Asymmetric binary : The quantity gap is large - Ordinal property ordinal: Meaningful order , Ruda 、 in 、 Small
- Interval scaling attribute interval scaled: Measure in order of unit length , Value order , No zero point , Multiples are meaningless
- Ratio scale attribute ratio scaled: Numerical properties with fixed zeros , Ordered and computable multiples
( Nominal and narrative attributes are qualitative , Interval scaling attribute and ratio scaling attribute are quantitative )
Discrete properties (Discrete Attribute)& Continuous attributes (Continuous Attribute)
2. Data statistics

Concentration trend
- mean value mean
- Median median
- The number of mode
Empirical formula :mean - mode = 3×(mean - median)
Discrete trends 【 A measure of the degree of variation 】
- range (range, Full range ): The difference between the maximum and the minimum
- variance (Variance): The difference between the data value and the average

- Standard deviation (Standard deviation): The positive square root of variance
- Coefficient of variation : A measure of the magnitude of the standard deviation relative to the mean

quantile
Four percentile (quartile): Q1 (25th Percentiles percentile), Q3 (75th percentile)
Middle quartile range (Inter-quartile range): IQR = Q3 – Q1
Five numbers sum up : min, Q1, median, Q3, max
3. Data visualization
Data visualization 、 Process visualization 、 Result visualization
Basic statistical chart :
- boxplot / Box chart (Box plot): It can analyze the distribution difference of multiple attribute data

outliers : Usually higher or lower than 1.5 IQR Value - Histogram : It can analyze the change distribution of a single attribute in each interval
- Scatter plot : It can be used to display the correlation distribution of two sets of data , positive correlation 、 negative correlation 、 Unrelated
4. Data similarity
Data matrix :N×p,N Data ,p Dimensions
Dissimilarity matrix ,N Data points , Record the distance between two points , Lower triangular matrix

Similarity measure
- Similarity degree similarity:[0,1], The larger the value, the more similar
- The degree of difference dissimilarity/distance: The smaller the value, the more similar
- Proximity proximity ( Similarity or dissimilarity )
(1) Proximity measurement of nominal attributes

For binary attributes :

(2) Proximity measurement of ordinal attributes

(3) Proximity measurement of numerical attributes
① Minkowski distance Minkovski distance


Manhattan distance Manhattan Distance:L1 norm
Euclidean distance Euclidean Distance:L2 norm
Supremum distance / Chebyshev distance Supremum Distance:Lmax,L∞ norm ( Between all attributes , The biggest difference is the distance between the two objects )
② Z-score( Standardization )
③ Cosine similarity
cos(d1, d2) = (d1 • d2) /||d1|| ||d2||

(4) Proximity measurement of mixed attributes

版权声明
本文为[Muxi dare]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220535577264.html
边栏推荐
猜你喜欢

npm升级后问题,慌得一批

The title bar will be pushed to coincide with the status bar

C# ,类库

Excel 2016 打开文件第一次打不开,有时空白,有时很慢要打开第二次才行

Excel 2016 cannot open the file for the first time. Sometimes it is blank and sometimes it is very slow. You have to open it for the second time

varnish入门

what is wifi6?

C, class library

双击.jar包无法运行解决方法

Laravel implements the Holy Grail model with template inheritance
随机推荐
Redis in node -- ioredis
Several examples of pointer transfer, parameter transfer, value transfer, etc
字符识别easyocr
Error handling mechanism of the strongest egg framework in history
Laravel [view]
what is wifi6?
Usage and difference of shellexecute, shellexecuteex and winexec in QT
Double click The jar package cannot run the solution
Use of uniapp native plug-ins
Redis的基本知识
JSON.
STD:: String implements split
The prefix of static of egg can be modified, including boots
The address value indicated by the pointer and the value of the object indicated by the pointer (learning notes)
转置卷积(Transposed Convolution)
selenium預先加載cookie的必要性
Interview Basics
(十一)vscode代码格式化配置
Watch depth monitoring mode
deep learning object detection