当前位置:网站首页>R language advanced | generalized vector and attribute analysis
R language advanced | generalized vector and attribute analysis
2022-04-23 02:07:00 【jeffery0207】
In this tweet, we start from Generalized vector set out , from attribute The angle of , The depth resolution R Common data structures of language and their internal relations
Logical combing
There are two kinds of generalized vectors :atomic vector and List ( list ), And what we often say in the narrow sense of vector is atomic vector, As its name conveys , It can pass through like an atom Attribute added To form more complex data types . in addition NULL Although it doesn't belong to vector , But it is often used as a vector with zero length . The following figure shows their basic relationship .
atomic vector It is divided into four types :logical, integer, double, and character.atomic vector The difference between and lists is that they have different requirements for the kind of elements they belong to :atomic vector It is required that all elements must belong to the same type .
List Elements can be of different types
Every vector has properties (attributes), among names Is the most basic property of a vector . in addition , Dimension attributes (dim) It can make atomic vector Convert to matrix perhaps array object , What's interesting is that even List You can also add dim Property is converted to list-matirx; increase class Attributes will form S3 object , About S3 The target will be explained in detail in a tweet later , The most important S3 Objects include :factor,date , times, data frame and tibble. The following two diagrams show vectors and S3 Relationship of objects .
Atomic vectors
Depending on the type of element ,atomic vector It can be mainly divided into four categories : logical, integer, double and character. Where, integer type (integer) And floating point numbers (double) It belongs to numeric type (numeric) Vector . There are actually two other types of vectors :complex and raw, We don't usually use much , So don't discuss it here .
Scalars
Scalar (Scalars) It's a concept relative to a vector , Single digit 、 Logical values or strings are scalars , Notice that a vector with only one element is still a vector , Don't confuse scalars .
-
Logical value scalars include :
TRUEandFALSE, It can also be abbreviated toTandF -
Floating point numbers can be written in decimal (0.1234), Scientific enumeration (1.23e4) Or hexadecimal (0xcafe) etc. . There are three special types of floating-point scalars :
Inf,-InfandNaN( Missing value , It's not a number ) -
Integer types are written like floating-point numbers , But it must be followed by letters L, such as 1234L,1e4L perhaps 0xcafeL, And cannot contain decimals
-
Single quotation marks are required for string types (‘hi’) Or double quotes (“byte”)
Making longer vectors with c()
We can go through c() Create a vector :
lgl_var <- c(TRUE, FALSE)
int_var <- c(1L, 6L, 10L)
dbl_var <- c(1, 2.5, 4.5)
chr_var <- c("these are", "some strings")
When a vector contains a vector , The vector inside will be ” Squash “, The elements of the inner vector will be released to become the elements of the outer vector , That is, vectors are not recursive like lists (recursive) Data structure of .
c(c(1, 2), c(3, 4))
#> [1] 1 2 3 4
You can use it. typeof() To see the type of a vector , use length() Calculate the length of the vector .
typeof(lgl_var)
#> [1] "logical"
typeof(int_var)
#> [1] "integer"
typeof(dbl_var)
#> [1] "double"
typeof(chr_var)
#> [1] "character"
Missing values
R Used in language NA Express defect perhaps I do not know! Value ,NA yes not applicable Abbreviation .NA have ” Infectious “, Because it involves the vast majority of NA All of the results are NA.
NA > 5
#> [1] NA
10 * NA
#> [1] NA
!NA
#> [1] NA
But there are some special cases , The reason for these exceptions is simple , Because even if we take the NA Any other worthwhile result is the same .
NA ^ 0
#> [1] 1
NA | TRUE
#> [1] TRUE
NA & FALSE
#> [1] FALSE
When you're going to use == The judgment symbol determines which elements in the vector are NA when , It doesn't seem feasible . Why do all the returned results are NA Well ? Because you never know if this missing value is really equal to another missing value . You can use it. is.na() Function to determine whether there is a missing value in the vector .
x <- c(NA, 5, NA, 10)
x == NA
#> [1] NA NA NA NA
is.na(x)
#> [1] TRUE FALSE TRUE FALSE
Technically speaking , There are actually four missing values , Each of these atomic vector All correspond to a missing value :NA(logical),NA_integer_ (integer),NA_real_ (double),and NA_character_ (character). But the difference between them doesn't matter at all , You can only use NA, If necessary during the operation R It will be converted automatically .
Testing and coercion
We can use shapes such as is.*() To determine the type of vector , For example, you can use is.logical()、is.integer() 、is.double() and is.character() Four functions to determine whether the vector belongs to the type of logical value 、 Integer types 、 Floating point number type and string type . And then there is is.vector(), is.atomic(), and is.numeric() function , For their specific meaning, readers can check the help document to learn more about .
We talked about vectors earlier (atomic vector) It is required that all elements must belong to the same type , But what happens when a vector contains elements of different types ? At this time, elements of different types will be cast to the same type , The priority relationship of conversion is as follows ( The arrow indicates the direction of conversion ):logical -> integer -> double -> character. For example, when a vector contains both integers and strings , The number will be automatically converted into a string .
c("a", 1)
#> [1] "a" "1"
stay R In language , Most mathematical operation functions (+, log, abs, ect.) The type of... Can be automatically converted to number .
x <- c(FALSE, FALSE, TRUE)
as.numeric(x)
#> [1] 0 0 1
# Total number of TRUEs
sum(x)
#> [1] 1
# Proportion that are TRUE
mean(x)
#> [1] 0.333
Of course, the cast type can also be cast through the form such as as.*() The function of , such as as.logical()、 as.integer()、as.double() perhaps as.character(). When the cast does not meet the priority, an error is thrown .
as.integer(c("1", "1.5", "a"))
#> Warning: NAs introduced by coercion
#> [1] 1 1 NA
attribute (Attributes)
As mentioned above, each vector has attributes , Vectors can be converted to other more complex data types by adding attributes . This section introduces several important properties .
Get and set properties
You can think of attributes as adding to objects metadata The key/value pair . To get or modify a single property of an object, you can use attr() function , To get and set all the properties of an object, you can use attributrs() and structure() function .
a <- 1:3
attr(a, "x") <- "abcdef"
attr(a, "x")
#> [1] "abcdef"
attr(a, "y") <- 4:6
str(attributes(a))
#> List of 2
#> $ x: chr "abcdef"
#> $ y: int [1:3] 4 5 6
# Or equivalently
a <- structure(
1:3,
x = "abcdef",
y = 4:6
)
str(attributes(a))
#> List of 2
#> $ x: chr "abcdef"
#> $ y: int [1:3] 4 5 6
In fact, many attributes exist for a short time , After a certain operation, it will disappear . But there are two properties that can be preserved for a long time : name (names) And dimensions (dim).
attributes(a[1]) #a Is the object generated by the previous operation
#> NULL
attributes(sum(a))
#> NULL
Name attribute (names)
You can name vectors in four ways :
# When creating it:
x <- c(a = 1, b = 2, c = 3)
# By assigning a character vector to names()
x <- 1:3
names(x) <- c("a", "b", "c")
# Inline, with setNames():
x <- setNames(1:3, c("a", "b", "c"))
#using attr() function
x <- c(1, 2, 3)
ttr(x, 'names') <- c('a', 'b', 'c')
On the contrary, it can also be removed in the following two ways names attribute :x <- unname(x) or names(x) <- NULL.
dimension (Dimensions) attribute
Dimension attributes can be added to both vectors and lists , convert to matrix perhaps array object . So create matrix perhaps array Not only can it be used matrix() and array() function , You can also use dim() The function assigns a vector dimension attribute to create .
# Two scalar arguments specify row and column sizes
x <- matrix(1:6, nrow = 2, ncol = 3)
x
#> [,1] [,2] [,3]
#> [1,] 1 3 5
#> [2,] 2 4 6
# One vector argument to describe all dimensions
y <- array(1:12, c(2, 3, 2))
y
#> , , 1
#>
#> [,1] [,2] [,3]
#> [1,] 1 3 5
#> [2,] 2 4 6
#>
#> , , 2
#>
#> [,1] [,2] [,3]
#> [1,] 7 9 11
#> [2,] 8 10 12
# You can also modify an object in place by setting dim()
z <- 1:6
dim(z) <- c(3, 2)
z
#> [,1] [,2]
#> [1,] 1 4
#> [2,] 2 5
#> [3,] 3 6
because matrix and array Is derived from a vector , therefore matrix and array There are also some functions similar to those in vectors .
| Vector | Matrix | Array |
|---|---|---|
names() |
rownames(), colnames() |
dimnames() |
length() |
nrow(), ncol() |
dim() |
c() |
rbind(), cbind() |
abind::abind() |
t() |
aperm() |
|
is.null(dim(x)) |
is.matrix() |
is.array() |
A vector without a dimension attribute can be considered a one-dimensional vector , But it's actually dim() Function, you will find that the returned property value is NULL. A matrix can have a single column or row ,array You can also have only one dimension , The output of these cases is similar , But you can use str() Function to see the difference between them .
str(1:3) # 1d vector
#> int [1:3] 1 2 3
str(matrix(1:3, ncol = 1)) # column vector
#> int [1:3, 1] 1 2 3
str(matrix(1:3, nrow = 1)) # row vector
#> int [1, 1:3] 1 2 3
str(array(1:3, 3)) # "array" vector
#> int [1:3(1d)] 1 2 3
S3 atomic vectors
class Attribute is one of the most important attributes , add to class Property can convert a vector to S3 object . common S3 vectors Yes :factor, Date, POSIXct and Difftimes. To understand their relationship, refer to the following figure .
-
factor: A factor is a data type that stores classification variables based on an integer vector .
-
Date: Data type of storage time , Time precision days .
-
POSIXct/POSIXlt:POSIX yes Portable Operating System Interface Abbreviation ,ct Express calendar time,lt Express local time. It is also the data type of storage time , The accuracy of time is seconds .
-
Difftimes: It is also the data type of storage time . It records the length of time between two time points .
Because the latter three types are rarely used in our data analysis , So there is no intention to introduce , Interested readers can read the original .
factor (factor)
Factors are derived from integer vectors , The factor has two properties , One is class, Its value is "factor", This property makes it different from ordinary integer vectors ; Another attribute is level, It defines all allowed values in the factor .
x <- factor(c("a", "b", "b", "a"))
x
#> [1] a b b a
#> Levels: a b
typeof(x)
#> [1] "integer"
attributes(x)
#> $levels
#> [1] "a" "b"
#>
#> $class
#> [1] "factor"
Take a closer look at the following code , You'll find that when you use table() When the function calculates the number of elements in the factor , Allow to exist (level Property contains all values ) But values not actually in the factor will also be calculated , Of course, the result of the calculation must be zero .
sex_char <- c("m", "m", "m")
sex_factor <- factor(sex_char, levels = c("m", "f"))
table(sex_char)
#> sex_char
#> m
#> 3
table(sex_factor)
#> sex_factor
#> m f
#> 3 0
Ordered factors Is a special kind of factor . It behaves like a regular factor , but level The order of values makes sense , such as "low","medium" and "high" etc. .
grade <- ordered(c("b", "b", "a", "c"), levels = c("c", "b", "a"))
grade
#> [1] b b a c
#> Levels: c < b < a
some base R Function of , such as read.csv() and data.frame etc. , They will automatically convert the string vector into a factor , Sometimes if you don't want to do this , You can use parameters stringsAsFactors = FALSE Turn off this behavior .
list (list)
Lists also belong to vectors in a broad sense , But each of its elements can be a different data type . But combined with the content of the previous tweet , Technically speaking , Each element in the list is actually of the same type , Because each element in the list is actually a reference to the corresponding object , So every element must belong to the same type , But the object pointed to can belong to different data types .
Create a list of
You can use list() Function to create an object :
l1 <- list(
1:3,
"a",
c(TRUE, FALSE, TRUE),
c(2.3, 5.9)
)
typeof(l1)
#> [1] "list"
class(l1)
#> [1] "list"
str(l1)
#> List of 4
#> $ : int [1:3] 1 2 3
#> $ : chr "a"
#> $ : logi [1:3] TRUE FALSE TRUE
#> $ : num [1:2] 2.3 5.9
Because each element of the list is a link to a corresponding object , Creating a list does not involve copying other objects , So often the actual size of the list is smaller than you expected .
lobstr::obj_size(mtcars)
#> 7,208 B
l2 <- list(mtcars, mtcars, mtcars, mtcars)
lobstr::obj_size(l2)
#> 7,288 B
Lists are sometimes called recursive (recursive) Vector , Because the elements of a list can also be a list , and atomic vector The element of can only be scalar .
l3 <- list(list(list(1)))
str(l3)
#> List of 1
#> $ :List of 1
#> ..$ :List of 1
#> .. ..$ : num 1
c() Function can combine multiple lists into one list , If the parameter contains both atomic vectors And list ,c() The function will first atomic vectors Convert to list , Then merge the different lists . The following code compares c() and list() function :
l4 <- list(list(1, 2), c(3, 4))
l5 <- c(list(1, 2), c(3, 4))
str(l4)
#> List of 2
#> $ :List of 2
#> ..$ : num 1
#> ..$ : num 2
#> $ : num [1:2] 3 4
str(l5)
#> List of 4
#> $ : num 1
#> $ : num 2
#> $ : num 3
#> $ : num 4
Testing and coercion
It can be used is.list() Function to determine whether an object is a list ,as.list() Function can be used to convert other feasible objects into lists . If you want to convert the list to atomic vector have access to unlist() function .
list(1:3)
#> [[1]]
#> [1] 1 2 3
as.list(1:3)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2
#>
#> [[3]]
#> [1] 3
Matrices and arrays
about atomic vector, If you add dimension attributes , Can be converted to matrix perhaps array object ; Also for lists , Dimension properties can also be used to convert a list to list-matrix perhaps list-array object .
l <- list(1:3, "a", TRUE, 1.0)
dim(l) <- c(2, 2)
l
#> [,1] [,2]
#> [1,] Integer,3 TRUE
#> [2,] "a" 1
l[[1, 1]]
#> [1] 1 2 3
Data frames and tibbles
Previously, we introduced four methods based on atomic vector Based on S3 object , In addition, there are two methods based on the list S3 Objects are also very useful in data processing , They are : Data frame (data frame) and tibble.
Data frames are used very frequently in data processing , It has column name attribute (names)、 Row name attribute (row.names) and class attribute (data.frame).
df1 <- data.frame(x = 1:3, y = letters[1:3])
typeof(df1)
#> [1] "list" #data frame It's derived from a list
attributes(df1)
#> $names
#> [1] "x" "y"
#>
#> $class
#> [1] "data.frame" # class Attribute types
#>
#> $row.names
#> [1] 1 2 3
Compared to list , Data frames have another limitation : Each column of the data frame must have the same length , Make the data frame have a rectangular structure . That's why The data frame has the characteristics of both matrix and list .
-
Data frames can use
rownames()andcolnames(). Use... For data framesnames()The function returns the column name of the data frame . -
Data frames can use
nrowandncolfunction . Use... For data frameslength()The function returns the number of columns in the data frame .
Although the data frame is easy to use , But data frames also have their drawbacks , For example, duplicate row names are not allowed in the data frame , This can be a headache sometimes . Therefore, many similar data types have been derived :tibble and data.table. These two data types have their own characteristics , Due to space issues, we are not going to compare them in this tweet , But we found a detailed comparison data.frame,data.table and tibble The article , The link is at the end of the text , For interested readers to learn .
NULL
At the end of this tweet, we introduce a special data type :NULL. We can think of it as a vector of zero length , It has a special type and no attributes . have access to is.null() Function to detect NULL.
typeof(NULL)
#> [1] "NULL"
length(NULL)
#> [1] 0
x <- NULL
attr(x, "y") <- 1
#> Error in attr(x, "y") <- 1: attempt to set an attribute on NULL
is.null(NULL)
#> [1] TRUE
NULL There are two common functions :
-
As an empty vector of any type . For example, if you use
c()Function, but does not include any parameters , There will be aNULL. A vector sumNULLNothing will change after the merger . -
Represents a missing vector . For example, when a parameter of a function is optional ,
NULLOften used as the default value for this parameter . Pay attention toNAandNULLDistinguish ,NAIt often indicates the absence of an element of a vector .
Write at the end of the article
This is translation learning 《advanced R》 Series 2 , We will continue to share with you R Low level knowledge of advanced language , Please keep an eye on ~
Reference resources
版权声明
本文为[jeffery0207]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220836536686.html
边栏推荐
- Summary of I / O knowledge points
- 拨号vps会遇到什么问题?
- Leetcode39 combined sum
- 89 logistic回归用户画像用户响应度预测
- 关于局域网浅谈
- 一些使用代理IP的小技巧。
- 89 logistic回歸用戶畫像用戶響應度預測
- Under the pressure of sales, domestic mobile phones began to reduce prices, but they haven't put down their final face
- LeetCode 447. Number of boomerangs (permutation and combination problem)
- Shardingsphere read write separation
猜你喜欢
![[tutorial] how to use GCC](/img/60/c5804fc4da965afaa3cc72c44a11f9.png)
[tutorial] how to use GCC "zero assembly" for white whoring MDK

Dynamic memory management

动态代理ip的测试步骤有哪些?

拨号服务器是什么,有什么用处?

Some tips for using proxy IP.

【Chrome扩展程序】content_script的跨域问题

一些使用代理IP的小技巧。

World Book Day 𞓜 a good book that technicians should not miss (it cutting-edge technology)

什么是bgp服务器,有哪些优势?

如何设置电脑ip?
随机推荐
Introduction to micro build low code zero Foundation (lesson 2)
Wechat public platform test number application, authorized login function and single sign on using hbuilder X and wechat developer tools
Log4j2 configuration
什么是bgp服务器,有哪些优势?
[assembly language] understand "stack" from the lowest point of view
005_redis_set集合
[Leetcode每日一题]396. 旋转函数
89 logistic回歸用戶畫像用戶響應度預測
Micro build low code zero foundation introductory course
NPM yarn startup error [resolved]
Shardingsphere broadcast table and binding table
搭建个人主页保姆级教程(二)
如何选择一台好的拨号服务器?
arduino esp8266 网络升级 OTA
一加一为什么等于二
中金财富跟中金公司是一家公司吗,安全吗
校园转转二手市场源码
Introduction to esp32 Bluetooth controller API
Echo "new password" |passwd -- stdin user name
2018 China Collegiate Programming Contest - Guilin Site J. stone game