当前位置:网站首页>Data analysis learning (I) data analysis and numpy Foundation
Data analysis learning (I) data analysis and numpy Foundation
2022-04-23 10:53:00 【The big pig of the little pig family】
Chapter one Data analysis and Numpy Basics
1.1 Data analysis
1.1.1 data
When it comes to data, we must first understand what data is , The data is that we observe 、 Experimental or calculated results . Make us a description of the phenomenon , Raw material used to represent objective things . The data mentioned here refers to structured data , Not scattered 、 Disorganized data . In data processing, we also need to care about the type of data , Common data types include the following :
- Tabular data , Each column usually contains different types of data .
- Multidimensional arrays .
- Uniform or non-uniform array .
- Multiple data tables associated with key positions .
in application , Data is usually converted , Convert to a more suitable analysis 、 Modeling patterns , You can also extract features from the data for analysis .
1.1.2 Data storage type
The beginning of everything when accessing data , That is, the output and input of data is the first step in using other tools . The input and output of data usually consists of the following types , Read text files and other more efficient formats on your hard disk 、 Exchange data with the database 、 Interact with network resources .
.csv、.xlsx、.txt These three formats are the common storage types of text data , After that, we will further discuss the reading of these three formats 、 Handle 、 And storage .
1.2Numpy Basics
1.2.1Numpy brief introduction
NumPy It's a Python package . namely “Numeric Python”. It is a library of multidimensional array objects and a collection of routines used to process arrays .Numeric, namely NumPy The forerunner of , By Jim Hugunin Developed . He also developed another package Numarray, It has some additional features . 2005 year ,Travis Oliphant By way of Numarray Integrated into Numeric Package to create NumPy package .
NumPy Usually with SciPy(Scientific Python) and Matplotlib( Drawing library ) Use it together . This combination is widely used to replace MatLab, Is a popular technology computing platform . however ,Python As MatLab alternatives , Now it is regarded as a more modern and complete programming language . In the field of data analysis, our main concern is :
- Common high-speed 、 Combination algorithm , for example sort、mean、sum etc. .
- Efficient descriptive statistics and aggregation .
- Clean up the data 、 Construct subsets 、 Filter 、 Transform and fast vectorization calculation .
Even though Numpy It provides a certain basis for data operation , But it should still be regarded as pandas The basis of , Instead of an independent data analysis library , The core reason is that it can not effectively operate on common time series types .
Numpy The importance of is also reflected in the following aspects :
- Numpy Store data internally in contiguous blocks of memory , This one Python The built-in data structure is different .
- Numpy The algorithm library uses C Written language , So when operating data memory , There is no need to check the data type , Its array also uses less memory than other built-in sequences .
- Numpy You can perform complex calculation of the whole array without writing a loop .
1.2.2ndarray The creation of
N Dimensional array object -ndarray yes Numpy One of the core features , It's a fast one 、 Flexible container for similar large data sets , This array allows you to operate on the entire data using scalar like operations .
ndarray The interior consists of :
- A point to data ( A piece of data in a memory or memory mapped file ) The pointer to .
- Data type or dtype, A cell that describes a fixed size value in an array .
- An array shape (shape) tuples , A tuple representing the size of each dimension .
- A span tuple (stride), The integer refers to the need to move forward to the next element of the current dimension " Across " Bytes of .
Method 1:array() Method ,array The method is introduced as follows :
numpy.array(object, dtype = None, copy = True, order = None, subok = >False, ndmin = 0)
Parameters 1:object: Specify an array or nested sequence
Parameters 2:dtype: Specifies the data type of the array element , Optional
Parameters 3:copy: Specifies whether the object needs to be copied , Optional
Parameters 4:order: Specifies the style in which the array is created ,C In the direction of the line ,F For column direction ,A In any direction ( Default )
Parameters 5:subok: Specifies that an array consistent with the base class type is returned by default
Parameters 6:ndmin: Specifies the minimum dimension of the generated array
The correct sample 1: Pass a non nested list object , Generate a one-dimensional array .
# Initialization data , Pass a non nested list object , Generate a one-dimensional array .
data = range(10)
arr = np.array(data)
--------------------------------------------------------------------------
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
The correct sample 2: Pass a nested list object , Generate a multidimensional array .
# Initialization data , Pass a nested list object , Generate a multidimensional array .
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
arr = np.array(data)
--------------------------------------------------------------------------
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
The correct sample 3: Pass a non nested list object , Also specify the minimum dimension , You can also generate a multidimensional array .
# Initialize the number of non nested data , Appoint ndmin Parameters , Into a multidimensional array .
data = range(10)
arr = np.array(data, ndmin=2)
--------------------------------------------------------------------------
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
The correct sample 4: Pass a non nested list object , Generate a one-dimensional array . Deliver at the same time dtype=complex, Specifies that the element type is The plural .
# Initialization data , Pass on dtype Parameters , Specify the element type
data = range(10)
arr = np.array(data, dtype=complex)
--------------------------------------------------------------------------
array([0.+0.j, 1.+0.j, 2.+0.j, 3.+0.j, 4.+0.j, 5.+0.j, 6.+0.j, 7.+0.j,
8.+0.j, 9.+0.j])
The correct sample 5: Pass a nested list object , Generate a multidimensional array . Deliver at the same time dtype Parameters [('a',np.int32),('b', complex)], Appoint a The copy element type is integer ,b The copy element type is plural .
# Initialization data , Pass a non nested list object , Pass on dtype Parameters , Appoint a The copy element type is integer ,b The copy element type is plural .
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
arr = np.array(data, dtype=([('a',np.int32),('b', complex)]))
--------------------------------------------------------------------------
array([[(1, 1.+0.j), (2, 2.+0.j), (3, 3.+0.j)],
[(4, 4.+0.j), (5, 5.+0.j), (6, 6.+0.j)],
[(7, 7.+0.j), (8, 8.+0.j), (9, 9.+0.j)]],
dtype=[('a', '<i4'), ('b', '<c16')])
Use arr[ name ] Access different copies .
arr['a']
--------------------------------------------------------------------------
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
arr['b']
--------------------------------------------------------------------------
array([[1.+0.j, 2.+0.j, 3.+0.j],
[4.+0.j, 5.+0.j, 6.+0.j],
[7.+0.j, 8.+0.j, 9.+0.j]])
The wrong sample 1: Pass an irregularly nested list object . Will give warning, And the creation result is not a multi-dimensional array, but a one-dimensional array , And is an array element, and the type is a list .
# The wrong sample , Passing irregularly nested lists
data = [[1, 2, 3], [4, 5, 6, 12], [7, 8, 9, 10, 11]]
arr = np.array(data)
--------------------------------------------------------------------------
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
array([list([1, 2, 3]), list([4, 5, 6, 12]), list([7, 8, 9, 10, 11])],
dtype=object)
Method 2:np.zeros() Method ,zeros() The method is introduced as follows .
numpy.zeros(shape, dtype=None, order=‘C’)
effect : Generates the specified shape 0 matrix
Parameters 1:shape: Metagroup type , Specify the matrix shape
Parameters 2:dtype: Specify the data type required by the array , for example ’ numpy.int8 '. The default is “numpy.float64”
Parameters 3:order: Specify the mode in which multidimensional data is stored ,‘C’: Row mode ‘F’: Column mode
The correct sample 1: Pass an integer , Generate one dimensional array , namely zero = np.zeros(m) perhaps zero = np.zeros((m,))
# Pass an integer , Generate one dimensional array 1,np.zeros(m)
zero = np.zeros(5)
--------------------------------------------------------------------------
array([0., 0., 0., 0., 0.])
# Pass an integer , Generate one dimensional array 2,zero = np.zeros((m,))
zero = np.zeros((5,))
--------------------------------------------------------------------------
array([0., 0., 0., 0., 0.])
The correct sample 2: Pass on a Yuanzu , Generate multidimensional arrays , namely zero = np.zeros((m, n))
# Pass an integer , Generate multidimensional arrays
zero = np.zeros((5,5))
--------------------------------------------------------------------------
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])
Method 3:np.empty() Method ,empty The method is introduced as follows .
effect : Returns a new array of the given shape and type , Without initializing the entry .
empty(shape, dtype=None, order=‘C’)
Parameters 1:dtype: Metagroup or integer , Specify an empty array shape
Parameters 2:dtype: Specify the data type
Parameters 3:order: Specify whether rows or columns take precedence
Return value 1:ndarray object
The correct sample 1:empty Methods and zeros Methods are used in the same way , No more details here , Just give an example .
#empty Method , Pass a tuple
np.empty([2, 2])
--------------------------------------------------------------------------
array([[ -9.74499359e+001, 6.69583040e-309],
[ 2.13182611e-314, 3.06959433e-309]])
ps1:empty The initialization value of the method is random , No 0, So than zeros The method is fast , But pay more attention to the problem of value .
ps2: Use array When passing a nested list when using the , By default, each sublist is treated as a row , It can deliver order Parameter modification .
1.2.3ndarray Transformation
The data of built-in type can not only be converted into ndarray, You can also reverse transform .
Method 1:arr.tolist() Convert to list .
# Data conversion , Convert to list
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
arr = np.array(data)
list = arr.tolist()
--------------------------------------------------------------------------
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Method 2:arr.tobytes() Convert to byte object .
# Data conversion , Convert to byte object
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
arr = np.array(data)
bytes = arr.tobytes()
--------------------------------------------------------------------------
b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00\t\x00\x00\x00'
ps1: All conversion methods return a new object , Instead of working on the original view .
1.2.4ndarray Slice and index
Whether indexing or slicing, its core goal is to find a subset of data that meets the requirements , Simply put, it is to express its subset in the correct way .
1.2.4.1 Index by row
Method 1:arr[ Line number ,:].
The correct sample : Pass line number , Index a row .
# Initialization data , Index by row method 1, Pass line number , Index single line
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr[0, :]
--------------------------------------------------------------------------
array([1, 2, 3])
Method 2:arr[ Line number ][:].
The correct sample : Pass line number , Index a row .
# Initialization data , Index by row method 2, Pass line number , Index single line
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr[0][:]
--------------------------------------------------------------------------
array([1, 2, 3])
ps1:arr[ That's ok , Column ] Methods cannot intuitively display dimension information ,arr[ Line number ][:] More intuitive display has several dimensions .
1.2.4.2 Index by column
Method 1:arr[:, Column number ].
The correct sample : Pass column number , Index a column .
# Initialization data , Index by column method 1, Pass column number , Index single column
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr[:, 0]
--------------------------------------------------------------------------
array([1, 4, 7])
The wrong sample : Use arr[:][ Column number ] Index a column , This statement does not retrieve the first column , But the first element of this array . Because this sentence can be understood as first arr_new=arr[:] Copy a new array , Again arr_new[0] Call its first element , Therefore, this method cannot be used to index columns .
# The wrong sample , The index is the first element
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr[:][0]
--------------------------------------------------------------------------
array([1, 2, 3])
ps1: It's not accurate to say , It should be called axis=0,axis=1, Or dimension 1、 dimension 2.
1.2.4.3 Single value index
Method 1:arr[ That's ok , Column ] perhaps arr[ That's ok ][ Column ].
The correct sample : Pass two integers , Index single value .
# Index by single value 1-1
arr[0, 0]
--------------------------------------------------------------------------
1
# Index by single value 1-1
arr[0][0]
--------------------------------------------------------------------------
1
1.2.4.4 Multiple lines / Column index
Method 1:arr[ That's ok 1: That's ok 2, Column 1: Column 2].
The correct sample 1: Transfer line 1、 That's ok 2、 Column 1、 Column 2 Index multiple rows / Column .
# Pass two parameters for each row and column , Index multiple rows 、 Column
arr[0:2, 0:1]
--------------------------------------------------------------------------
array([[1],
[4]])
The correct sample 2: Transfer line 1、 Column 1、 Column 2 Index multiple rows / Column .
# Transfer line 1、 Column 1、 Column 2 Index multiple rows / Column
arr[0:, 0:1]
--------------------------------------------------------------------------
array([[1],
[4],
[7]])
The correct sample 3: Transfer line 2、 Column 1 Index multiple rows / Column .
# Transfer line 2、 Column 1 Index multiple rows / Column
arr[:2, 1:]
--------------------------------------------------------------------------
array([[2, 3],
[5, 6]])
The wrong sample 1: Use arr[ That's ok 1: That's ok 2][ Column 1: Column 2], The above sentence is equivalent to arr_new=arr[ That's ok 1: That's ok 2, :] Again result=arr_new[ New line 1: New line 2, :], The final result result It is equivalent to doing multi row index twice .
arr[0:2][1:]
--------------------------------------------------------------------------
array([[4, 5, 6]])
ps1: When indexing multiple rows , You don't have to pass 4 Parameters , Don't deliver : Later parameters a: From a From the beginning to the end , Don't deliver : The previous parameters :a From the beginning to a-1 end ,: It means that the whole is from the beginning to the end
1.2.4.5 Condition index
Method 1:arr[condition].
The correct sample : Pass a function that can traverse all elements as a condition to filter the array .
# Filter using a conditional array
arr[arr < arr.mean()] = 0
--------------------------------------------------------------------------
array([[0, 0, 0],
[0, 5, 6],
[7, 8, 9]])
1.2.4.6 Boolean index
Boolean index can also be understood as a conditional index , Just a condition of use , A Boolean array of the same shape is used as a sieve , Filter out all locations as True Value , Examples are as follows :
# Initialize Boolean array
bool_arr = np.random.randn(9).reshape((3, 3))
bool_arr = bool_arr>0
--------------------------------------------------------------------------
array([[False, True, True],
[False, False, False],
[ True, True, False]])
# Use Boolean arrays for filtering
arr[bool_arr] = 0
--------------------------------------------------------------------------
array([[1, 0, 0],
[4, 5, 6],
[0, 0, 9]])
1.2.4.7 Magic index
stay Numpy Magic index in refers to the way of using integer array to index data .
Method 1:arr[[ That's ok 1, That's ok 2, That's ok 3...][ Column 1, Column 2, Column 3]]
Two elements correspond to index data , Indicates that the sublist of the column can not be transferred , If there is only one word list, it will be considered a row list .
The correct sample : Pass two row and two column indexes 2 It's worth .
arr[[2, 0], [2, 1]]
--------------------------------------------------------------------------
array([9, 2])
1.2.5ndarray Assignment
Except for assignment when creating , All indexes can be a way of assignment , namely arr[ Indexes ]=Values. The indexing process will not be repeated here , Give a simple example of index assignment other than Boolean index .
# Initialization data
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
arr = np.array(data)
--------------------------------------------------------------------------
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
The correct sample 1: Assignment by line .
# Assignment by line
arr[0] = [10, 10, 10]
--------------------------------------------------------------------------
array([[10, 10, 10],
[ 4, 5, 6],
[ 7, 8, 9]])
The correct sample 2: Assign values by column .
# Assign values by column
arr[:, 0] = [10, 10, 10]
--------------------------------------------------------------------------
array([[10, 2, 3],
[10, 5, 6],
[10, 8, 9]])
The correct sample 3: Single valued assignment .
# Single valued assignment
arr[0, 0] = 10
--------------------------------------------------------------------------
array([[10, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]])
The correct sample 4: Multiple lines 、 Column assignment .
arr[0:2, :] = [[10, 10, 10], [10, 10, 10]]
--------------------------------------------------------------------------
array([[10, 10, 10],
[10, 10, 10],
[ 7, 8, 9]])
The correct sample 4 Contrast example : Pay attention to the shape problem when assigning values , Equal lengths cannot be assigned directly .
arr[0:2, :] = np.arange(6)
--------------------------------------------------------------------------
ValueError: could not broadcast input array from shape (6) into shape (2,3)
The correct sample 5: Conditional assignment .
arr[arr < arr.mean()] = 10
--------------------------------------------------------------------------
array([[10, 10, 10],
[10, 5, 6],
[ 7, 8, 9]])
The correct sample 6: Index assignment .
arr[[2, 0], [2, 1]] = [10, 10]
--------------------------------------------------------------------------
array([[ 1, 10, 3],
[ 4, 5, 6],
[ 7, 8, 10]])
版权声明
本文为[The big pig of the little pig family]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230617063654.html
边栏推荐
- UEditor之——图片上传组件大小4M的限制
- Jerry's more accurate determination of abnormal address [chapter]
- MySql常用语句
- Kaggle - real battle of house price prediction
- Code implementation of general bubbling, selection, insertion, hill and quick sorting
- 精彩回顾 | DEEPNOVA x Iceberg Meetup Online《基于Iceberg打造实时数据湖》
- Windows installs redis and sets the redis service to start automatically
- Hikvision face to face summary
- Yarn core parameter configuration
- Example of pop-up task progress bar function based on pyqt5
猜你喜欢

Xshell+Xftp 下载安装步骤

Yarn core parameter configuration

ID number verification system based on visual structure - Raspberry implementation

第六站神京门户-------手机号码的转换

Initial exploration of NVIDIA's latest 3D reconstruction technology instant NGP

Introduction to wechat applet, development history, advantages of applet, application account, development tools, initial knowledge of wxml file and wxss file

Deploy jar package

Download and installation steps of xshell + xftp

SQL Server recursive query of superior and subordinate

Notes on concurrent programming of vegetables (IX) asynchronous IO to realize concurrent crawler acceleration
随机推荐
Chapter 120 SQL function round
Notes on concurrent programming of vegetables (V) thread safety and lock solution
全栈交叉编译X86完成过程经验分享
SQLServer 查询数据库死锁
Construction and traversal of binary tree
景联文科技—专业数据标注公司和智能数据标注平台
全栈交叉编译X86完成过程经验分享
59. Spiral matrix (array)
What if Jerry's function to locate the corresponding address is not accurate sometimes? [chapter]
Notes on concurrent programming of vegetables (IX) asynchronous IO to realize concurrent crawler acceleration
SQL server query database deadlock
/Can etc / shadow be cracked?
【leetcode】102. Sequence traversal of binary tree
微信小程序简介、发展史、小程序的优点、申请账号、开发工具、初识wxml文件和wxss文件
209、长度最小的子数组(数组)
Pycharm
What are the system events of Jerry's [chapter]
Learning notes 7-depth neural network optimization
任意文件读取漏洞 利用指南
SSH利用私钥无密钥连接服务器踩坑实录