当前位置:网站首页>Visual common drawing (V) scatter diagram
Visual common drawing (V) scatter diagram
2022-04-23 10:54:00 【The big pig of the little pig family】
Visualizing common drawings ( 5、 ... and ) Scatter plot
One . Introduction to scatter diagram
Scatter diagram is also called X-Y chart , It shows all the data in the form of points in the rectangular coordinate system , To show the degree of interaction between variables , The position of the point is determined by the value of the variable .
By observing the distribution of data points on the scatter plot , We can infer the correlation between variables . If there is no correlation between variables , Then, in the scatter diagram, it will be shown as randomly distributed discrete points , If there's a correlation , Then most of the data points will be relatively dense and present a certain trend . The correlation of data is mainly divided into :
- positive correlation ( The values of two variables increase at the same time ).
- negative correlation ( The value of one variable increases and the value of the other decreases ).
- Unrelated .
- Linear correlation .
- Exponential correlation .
Scatter charts are often used in conjunction with regression lines , Summarize and analyze the existing data for prediction analysis .
For those variables, there is a close relationship , But these relationships are not as accurate as mathematical and physical formulas , Scatter chart is a good graphic tool . But in the analysis process, we need to pay attention to , The correlation between these two variables is not equivalent to a definite causal relationship , Other influencing factors may also need to be considered .
Two . Composition of scatter diagram
A standard scatter diagram includes at least the following parts :
- The vertical axis : Represents the value of one of the variables
- The horizontal axis : Represents the value of one of the variables
- spot :(X,Y)
- The regression line : The line that runs through all points most accurately
3、 ... and . Application scenarios
Fit data : Data from two consecutive data fields .
The main function : Observe the distribution of data .
Number of applicable data : unlimited .
remarks : In order to better observe the data distribution , You need to set the transparency or color of data points .
Suitable for the scene :
- Display and compare values , Not only can it show trends , It can also display the shape of the data cluster , And the relationship of data points in the data cloud .
Not suitable for the scene :
- Display the proportion of each classification data .
Four . Realization
stay matplotlib Use in scatter Function to realize scatter diagram , The functions are described as follows :
scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None,vmin=None, vmax=None, alpha=None, linewidths=None, *,edgecolors=None, plotnonfinite=False, data=None, **kwargs)
Parameters 1:x,y: Specify the coordinates of the data scatter .
Parameters 2:s: Numerical type , Specifies the size of the scatter .
Parameters 3:c: Array or class array type , Specifies the color of the scatter .
Parameters 4:marker: Qualified string , Specifies the marker type of the scatter ( The default is :‘o’).
Parameters 5:cmap: Specify the selected colormap.
Parameters 6:norm: Unknown .
Parameters 7、8:min、vmax and norm Used together to normalize data .
Parameters 9:alpha: floating-point , Specifies the transparency of the scatter .
Parameters 10:linewidths: Integer type , Specifies the lineweight of the scatter edge ; If marker by None, Then use verts To construct a scatter marker
Parameters 11:verts: Unknown .
Parameters 12:edgecolors: Array or class array type , Specifies the scatter edge color , Will cycle .
Parameters 13:plotnonfinite: Boolean type , combination set_bad Use , Specifies whether to draw points in an unrestricted way .
Parameters 14:**kwargs: The accepted keyword parameters are passed to Collection example .
Return value : The associated PathCollection example .
Use to SOCR-HeightWeight.csv Data sets, for example , The data set records a total of 25000 The height and weight of an object , With height as the horizontal axis , Take weight as the vertical axis , Look at the relationship between two variables , The complete code is as follows :
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
import numpy as np
import pandas as pd
plt.rcParams['font.sans-serif'] = ['SimHei'] # Settings support Chinese
plt.rcParams['axes.unicode_minus'] = False # Set up - Number
plt.style.use('seaborn-dark-palette')
df = pd.read_csv("SOCR-HeightWeight.csv", index_col=0)
height = df["Height(Inches)"].values.reshape(-1, 1)
weight = df["Weight(Pounds)"].values.reshape(-1, 1)
model = LinearRegression()
model.fit(height, weight)
coef = model.coef_[0]
intercept = model.intercept_[0]
height_avg = np.average(height)
weight_avg = np.average(weight)
quadrant1 = df[(df["Height(Inches)"] >= height_avg) & (df["Weight(Pounds)"] >= weight_avg)]
quadrant1_height = quadrant1["Height(Inches)"][:3000]
quadrant1_weight = quadrant1["Weight(Pounds)"][:3000]
plt.scatter(quadrant1_height, quadrant1_weight, alpha=0.3, label=" Scatter plot first quadrant ")
quadrant2 = df[(df["Height(Inches)"] <= height_avg) & (df["Weight(Pounds)"] >= weight_avg)]
quadrant2_height = quadrant2["Height(Inches)"][:3000]
quadrant2_weight = quadrant2["Weight(Pounds)"][:3000]
plt.scatter(quadrant2_height, quadrant2_weight, alpha=0.3, label=" Scatter plot second quadrant ")
quadrant3 = df[(df["Height(Inches)"] <= height_avg) & (df["Weight(Pounds)"] <= weight_avg)]
quadrant3_height = quadrant3["Height(Inches)"][:3000]
quadrant3_weight = quadrant3["Weight(Pounds)"][:3000]
plt.scatter(quadrant3_height, quadrant3_weight, alpha=0.3, label=" The third quadrant of the scatter chart ")
quadrant4 = df[(df["Height(Inches)"] >= height_avg) & (df["Weight(Pounds)"] <= weight_avg)]
quadrant4_height = quadrant4["Height(Inches)"][:3000]
quadrant4_weight = quadrant4["Weight(Pounds)"][:3000]
plt.scatter(quadrant4_height, quadrant4_weight, alpha=0.3, label=" The fourth quadrant of the scatter chart ")
# Draw average
plt.hlines(weight_avg, min(height), max(height), ls="--", color='r', lw=2, label=' Average weight ')
plt.vlines(height_avg, min(weight), max(weight), ls='--', color='k', lw=2, label=' Average height ')
x = np.arange(min(height), max(height), 0.05)
y = coef * x + intercept
plt.plot(x, y, lw=2, color="darkgray", label=" Regression line of height and weight ")
plt.title(" Scatter diagram of height and weight ", fontsize=25, fontweight="bold")
plt.xlabel(" height (Inches)", fontsize=20)
plt.ylabel(" weight (Pounds)", fontsize=20)
plt.legend(fontsize=15)
plt.show()
The effect is as follows :

5、 ... and . Reference resources
版权声明
本文为[The big pig of the little pig family]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230617063254.html
边栏推荐
- 《Neo4j权威指南》简介,求伯君、周鸿袆、胡晓峰、周涛等大咖隆重推荐
- Six practices of Windows operating system security attack and defense
- The courses bought at a high price are open! PHPer data sharing
- colab
- 242. Valid Letter ectopic words (hash table)
- Arbitrary file reading vulnerability exploitation Guide
- Cve-2019-0708 vulnerability exploitation of secondary vocational network security 2022 national competition
- Jerry's users how to handle events in the simplest way [chapter]
- Read integrity monitoring techniques for vision navigation systems - 4 multiple faults in vision system
- Visual solutions to common problems (VIII) mathematical formulas
猜你喜欢

Example of pop-up task progress bar function based on pyqt5

第六站神京门户-------手机号码的转换

JVM - common parameters

Intuitive understanding entropy

MySQL how to merge the same data in the same table

UEditor之——图片上传组件大小4M的限制

Notes on concurrent programming of vegetables (IX) asynchronous IO to realize concurrent crawler acceleration

Initial exploration of NVIDIA's latest 3D reconstruction technology instant NGP

Introduction to data analysis 𞓜 kaggle Titanic mission (III) - > explore data analysis

Source insight 4.0 FAQs
随机推荐
MySQL how to merge the same data in the same table
242. Valid Letter ectopic words (hash table)
Contact between domain name and IP address
景联文科技—专业数据标注公司和智能数据标注平台
使用 PHP PDO ODBC 示例的 Microsoft Access 数据库
Is the pointer symbol of C language close to variable type or variable name?
Gets the current time in character format
主流手机分辨率与尺寸
Xshell+Xftp 下载安装步骤
Linked list intersection (linked list)
Cve-2019-0708 vulnerability exploitation of secondary vocational network security 2022 national competition
SSH uses private key to connect to server without key
VScode
Introduction to data analysis 𞓜 kaggle Titanic mission (IV) - > data cleaning and feature processing
Visualized common drawing (II) line chart
Six practices of Windows operating system security attack and defense
Comparison and practice of prototype design of knowledge service app
Chapter 1 of technical Xiaobai (express yourself)
What are the system events of Jerry's [chapter]
比深度学习更值得信赖的模型ART