当前位置：网站首页>NiO related Basics

NiO related Basics

2022-04-23 19:16:00 【Li Siwei】

User space and kernel space concepts

We know that the operating system now uses virtual memory , So right. 32 Bit operating system , Its addressing space （ Virtual storage space ） by 4G（2 Of 32 Power ）.
At the heart of the system is the kernel , Independent of ordinary applications , Access to protected memory space , You also have all access to the underlying hardware devices .( The kernel is also an application , That is, the operating system application , It starts one or more processes , Each process of the kernel also has its own memory space . User processes can call... By calling the system , Call the service provided by the kernel process , Get kernel space data )
To ensure that the user process cannot directly operate the kernel , Keep the kernel safe , Worry system divides the virtual space into two parts , Part of it is kernel space , Part of it is user space . in the light of linux In terms of operating system , Will be the highest 1G byte （ From virtual address 0xC0000000 To 0xFFFFFFFF）, For kernel use , It's called kernel space , And the lower 3G byte （ From virtual address 0x00000000 To 0xBFFFFFFF）, For each process , It's called user space . Each process can enter the kernel through a system call , therefore ,Linux The kernel is shared by all processes in the system . therefore , From the perspective of the specific process , Each process can have 4G Bytes of virtual space .

User process IO

in order to OS The security and so on , User processes cannot be operated directly I/O The equipment , It must be assisted by a system call request kernel I/O action , And the kernel will be for each I/O Equipment maintenance one buffer.
Linux Treat all peripherals as files , When the user process needs to configure the external device IO when , Initiate a system call to the kernel , The kernel will return a buffer to the user process IO The address index of the kernel buffer of the data , Because the kernel treats peripherals as files , This buffer index is called a file descriptor （File Descriptor）- Shorthand for fd.
Insert picture description here
The whole request process is ： User process initiates request , After the kernel receives the request , from I/O Get data from the device to buffer in , then buffer Data in copy To the address space of the user process , The user process obtains the data and then responds to the client .

During the entire request process , Data input to buffer It takes time to , And from buffer It also takes time to copy data to the process . So depending on how you wait in these two periods ,I/O Actions can be divided into the following five modes ：

Blocking I/O (Blocking I/O)
Non blocking I/O (Non-Blocking I/O)
I/O Reuse （I/O Multiplexing)
Signal driven I/O (Signal Driven I/O)
asynchronous I/O (Asynchrnous I/O)
explain ： If you want to know more, you may need linux/unix Knowledge of aspects of , There should be some detailed principles of network programming , But for most java For the programmer , You don't need to know the underlying details , Just know a concept , Knowing that for the system , The bottom is supportive .

Generally, the process of inputting data into the kernel buffer is called waiting for data preparation , Another process that may take a long time and cause blocking is to copy the data from the kernel buffer to the user process space .

Blocking IO

Insert picture description here
When the user process calls recvfrom This system call , The kernel starts IO The first stage of ： Wait for the data to be ready . about network io Come on , Many times the data hasn't arrived at the beginning （ such as , I haven't received a complete UDP package ）, At this time, the kernel has to wait for enough data to arrive . On the user process side , The whole process will be blocked . When the kernel waits until the data is ready , It will copy the data from the kernel to user memory , Then the kernel returns the result , The user process is released block The state of , Run it again .
therefore ,blocking IO It is characterized by IO Both phases of implementation are block 了 .

Non blocking IO

stay linux Next , Can be set by socket Turn it into non-blocking, When it comes to such socket When reading , The process is as follows ：
Insert picture description here
When the user process calls recvfrom when , The system does not block user processes , But immediately return to a ewouldblock error , From the perspective of user process , There is no need to wait , It's about getting an immediate result . The user process judgment flag is ewouldblock when , I know the data is not ready yet , So it can do something else , So it can send... Again recvfrom, Once the data in the kernel is ready . And again received the user process's system call, Then it immediately copies the data to the user memory , Then return .

When an application makes a non blocking call in a loop recvfrom, We call it polling . Applications constantly poll the kernel , See if you're ready for some operations . This is usually a waste CPU Time , But this pattern occasionally encounters .

I/O Reuse （I/O Multiplexing)

IO multiplexing The word may be a little strange , But if I say select,epoll, I think we can all understand . Some places also call this IO The way is event driven IO. We all know ,select/epoll The good thing about it is that it's a single process You can handle multiple network connections at the same time IO. Its basic principle is select/epoll This function Will constantly poll all the socket, When a socket There's data coming in , Just inform the user of the process .
In this mode , The user needs to make two system calls , First of all to select call , Then the user process will be called by the whole block live （ In multithreaded applications , Just a thread is block live , Not the whole process ). When select When the call returns , The user process proceeds again recvfrom call , Then copy the data from the kernel buffer to the user process space .
Its flow chart is as follows ：
Insert picture description here
When the user process calls select, Then the whole process will be block, At the same time , The kernel will “ monitor ” all select conscientious socket, When any one socket The data in is ready ,select It will return . At this time, the user process calls read operation , Copy data from the kernel to the user process .
This figure and blocking IO It's not that different , in fact , It's worse . Because we need to use two system call (select and recvfrom), and blocking IO Only one was called system call (recvfrom). however , use select The advantage is that it can handle multiple connection.（ Many say . therefore , If the number of connections processed is not very high , Use select/epoll Of web server It's not necessarily better than using multi-threading + blocking IO Of web server Better performance , Maybe the delay is even greater .select/epoll The advantage is not that you can handle a single connection faster , It's about being able to handle more connections .）
stay IO multiplexing Model in , In the actual , For each of these socket, It's usually set to non-blocking, however , As shown in the figure above , For the entire user process In fact, it has been block Of . It's just process Be being select This function block, Rather than being socket IO to block.

in other words , Single thread multiplexing this way , In fact, it does not reduce the response time of a single application request , For a single application request , It's always blocked . But it improves the concurrency of application requests that can be processed at the same time , In high concurrency , So as not to make the response time of a single application request too long , The processing power of this mode in the case of high concurrency is stronger than that of other modes .

File descriptor fd

Linux The kernel can operate all external devices as a file . So we can operate with the external device as the operation of the file . We read and write a file , By calling the system call provided by the kernel ; The kernel returns us a filede scriptor（fd, File descriptor ）. And to one socket There will also be corresponding descriptors for reading and writing , be called socketfd(socket The descriptor ）. A descriptor is a number , Point to a structure in the kernel （ File path , Data area , Wait for some attributes ）. So our application can read and write files through descriptors .

select

The basic principle ：select File descriptors for function monitoring 3 class , Namely writefds、readfds、 and exceptfds. After calling select Function will block , Until a descriptor is ready （ There's data Can be read 、 Can write 、 Or there is except）, Or a timeout （timeout Specify waiting time , If immediate return is set to null that will do ）, The function returns . When select When the function returns , You can do this by traversing fdset, To find the ready descriptor .

shortcoming :
1、select The biggest drawback is that a single process opens FD There are certain restrictions , It consists of FDSETSIZE Set up ,32 The default of bit computer is 1024 individual ,64 The default of bit computer is 2048.
In general, this number has a lot to do with system memory ,” The specific number can be cat /proc/sys/fs/file-max see ”.32 The default of bit computer is 1024 individual .64 The default of bit computer is 2048.
2、 Yes socket When scanning, it's a linear scan , That is, the polling method is adopted , Low efficiency .
When there are more sockets , Every time select() Through traversal FDSETSIZE individual Socket To complete the dispatch , Either way Socket Is active , Go through it all . It's a lot of waste CPU Time .” If you can register a callback function for the socket , When they are active , Automatic completion of related operations , That avoids polling ”, That's exactly what it is. epoll And kqueue It's done .
3、 Need to maintain one to store a lot of fd Data structure of , This will make the user space and kernel space to transfer the structure of the replication overhead .

poll

The basic principle ：poll In essence, select There is no difference between , It copies the array passed in by the user into kernel space , And then look up each fd Corresponding device status , If the device is ready, add an entry to the device wait queue and continue to traverse , If you're done traversing all of fd No ready devices were found , Suspend the current process , Until the device is ready or active timeout , When awakened, it will traverse again fd. This process has gone through many unnecessary traversal .

It has no maximum number of connections , The reason is that it's based on a linked list , But there is also a drawback ：1、 a large number of fd The array is copied between user mode and kernel address space , And whether this kind of replication makes sense .
2 、poll Another characteristic is “ Level trigger ”, If you report fd after , Not handled , So next time poll It will be reported again fd.

Be careful ： From above ,select and poll All need to be back , Get the ready... By traversing the file descriptor socket. in fact , A large number of clients connected at the same time may only be in a few ready states at one time , So as the number of descriptors monitored grows , Its efficiency will also decrease linearly .

epoll

epoll Is in 2.6 In the kernel , It was before select and poll Enhanced version of . be relative to select and poll Come on ,epoll More flexible , There is no descriptor limit .epoll Use one file descriptor to manage multiple descriptors , Store the events of the user relationship's file descriptor in an event table in the kernel , So in user space and kernel space copy Just once. .

The basic principle ：epoll Support horizontal trigger and edge trigger , The biggest feature is edge triggered , It only tells the process what fd Just got ready , And only once . Another characteristic is ,epoll Use “ event ” Is ready to inform , adopt epollctl register fd, Once it's time to fd be ready , The kernel uses something like callback Call back mechanism to activate the fd,epollwait You can receive the notice .

epoll The advantages of ：1、 There is no maximum concurrent connection limit , It can be opened FD The upper limit is much larger than 1024（1G Can monitor about 10 Ten thousand ports ）.
2、 Efficiency improvement , It's not polling , Not as FD The number increases and the efficiency decreases .
Only active is available FD Will call callback function ; namely Epoll The biggest advantage is that it only cares about you “ active ” The connection of , It's not about the total number of connections , So in the actual network environment ,Epoll Will be much more efficient than select and poll.
3、 Memory copy , utilize mmap() File mapped memory acceleration and kernel space messaging ; namely epoll Use mmap Reduce replication overhead .

From sync 、 asynchronous , And obstruction 、 From the perspective of non blocking two dimensions ：
Insert picture description here

JDK1.5_update10 Version USES epoll Instead of the traditional select/poll, Greatly improved NIO The performance of communication .

cache IO

cache IO Also known as the standard IO, Default for most file systems IO Operations are all caching IO. stay Linux The cache of IO In mechanism , The operating system will IO The data is cached in the file system's page cache （ page cache ） in , in other words , The data will be copied to the buffer of the operating system kernel first , Then it will copy from the buffer of the operating system kernel to the address space of the application .

cache IO The shortcomings of ： Data in the process of transmission needs to be copied in the application address space and the kernel many times , What these data copying operations bring CPU And memory overhead is very large .

Zero copy technology classification

The development of zero copy technology has been diversified , There are many kinds of zero copy technologies available , Currently, there is no zero copy technology for all scenarios . about Linux Come on , There are many existing zero copy technologies , Most of these zero copy technologies exist in different Linux Kernel version , There are some old technologies in different Linux Kernel versions have been greatly developed or have been gradually replaced by new technologies . This paper divides these zero copy technologies into different scenarios . Sum up ,Linux The zero copy technology in mainly has the following several ：

direct I/O： For this kind of data transmission , Applications have direct access to hardware storage , The operating system kernel is just auxiliary data transmission ： This kind of zero copy technology is aimed at the situation that the operating system kernel does not need to process the data directly , Data can be transferred directly between the buffer and the disk in the application address space , There's no need for Linux Support for page caching provided by the operating system kernel .
In the process of data transmission , Avoid copying data between the operating system kernel address space buffer and the user application address space buffer . sometimes , The application does not need to access the data in the process of data transmission , that , Take data from Linux Copy the page cache to the buffer of the user process , The transferred data can be processed in the page cache . In some special circumstances , This zero copy technology can achieve better performance .Linux Similar system calls are provided in mmap(),sendfile() as well as splice().
The data is in Linux The transfer process between the page cache and the user process buffer is optimized . The zero copy technology focuses on flexible handling of data copy operations between the user process buffer and the operating system's page cache . This method continues the traditional way of communication , But more flexible . stay Linux in , This method mainly uses copy at write technology .

The purpose of the first two methods is to avoid buffer copy operation between application address space and operating system kernel address space . These two types of zero copy technologies are usually applicable in some special cases , For example, the data to be transmitted does not need to be processed by the operating system kernel or by the application program . The third method inherits the traditional concept of data transmission between application address space and operating system kernel address space , And then optimize the data transmission itself . We know , Data transfer between hardware and software can be done by using DMA To carry out ,DMA There is little need for data transfer CPU Participate in , So we can take CPU Free up to do more other things , But when data needs to be buffered in user address space and Linux When the page cache of the operating system kernel is transferred between them , There is no parallel DMA This tool can be used ,CPU You need to be fully involved in this data copy operation , Therefore, the purpose of the third method is to effectively improve the efficiency of data transfer between the user address space and the operating system kernel address space .

Be careful , Whether various zero copy mechanisms can be implemented depends on whether the underlying operating system provides corresponding support .

Insert picture description here

When an application accesses a piece of data , The operating system first checks for , Have you visited this file recently , Whether the file contents are cached in the kernel buffer , If it is , The operating system is directly based on read System call provides buf Address , Copy the contents of the kernel buffer to buf In the specified user space buffer . If not , The operating system first copies the data on the disk to the kernel buffer , At present, this step mainly depends on DMA To transmit , Then copy the contents of the kernel buffer to the user buffer .
Next ,write The system call copies the contents of the user buffer to the kernel buffer related to the network stack , Last socket Then send the contents of the kernel buffer to the network card .

As can be seen from the above figure , Four copies of the data were generated , Even if DMA To handle the communication with the hardware ,CPU Still need to process two copies of data , meanwhile , In the user mode and the kernel mode, there are many context switches , No doubt it's aggravating CPU burden .
In the process , We have not made any changes to the contents of the document , So copying data back and forth between kernel space and user space is undoubtedly a waste , Zero copy is mainly to solve this inefficiency .

Make data transmission unnecessary user space, Use mmap

One way we can reduce the number of copies is to call mmap() Instead of read call ：

Application calls mmap(), The data on the disk will pass through DMA The copied kernel buffer , Then the operating system will share the kernel buffer with the application , In this way, you don't need to copy the contents of the kernel buffer to user space . The application calls write(), The operating system directly copies the contents of the kernel buffer to socket Buffer zone , All this happens in kernel state , Last , socket The buffer sends the data to the network card .

alike , It's easy to look at the picture ：
Insert picture description here

Use mmap replace read There's a significant reduction in one copy , When there is a large amount of copied data , No doubt it improves efficiency . But use mmap There is a price . When you use mmap when , You may encounter some hidden traps . for example , When your program map A document , But when this file is truncated by another process (truncate) when , write The system call will be SIGBUS Signal termination . SIGBUS The signal defaults to killing your process and creating a coredump, If your server is suspended like this , That would be a loss .

Usually we use the following solutions to avoid this problem ：

by SIGBUS Signal set up signal processing program When you meet SIGBUS Signal time , The signal processor simply returns , write The system call returns the number of bytes written before it is interrupted , also errno It's going to be set to success, But it's a bad way to deal with it , Because you don't solve the core of the problem .

Use file rental lock Usually we use this method , Use lease locks on file descriptors , We request a lease lock from the kernel for the file , When other processes want to truncate this file , The kernel will send us a real-time RT_SIGNAL_LEASE The signal , Tell us that the kernel is breaking the read and write locks you've imposed on files . In this way, the program accesses illegal memory and is SIGBUS Before killing , Yours write The system call will be interrupted . write Returns the number of bytes that have been written , And put errno by success. We should mmap Lock the file before it , And unlock the file after the operation .

版权声明
本文为[Li Siwei]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/04/202204210600380923.html