当前位置:网站首页>Chris LATTNER, father of llvm: the golden age of compilers
Chris LATTNER, father of llvm: the golden age of compilers
2022-04-23 01:24:00 【ONEFLOW deep learning framework】
author |Chris Lattner
translate | Hu Yanjun 、 Zhou Yakun
The discussion of the failure theory of Moore's law is increasing day by day ,2018 year , Turing prize winner John Hennessey and David Patterson Even more outspoken in a speech , For decades RISC( Reduced instruction set ) and CISC( Complex instruction set ) The debate over which is better or worse can be ended , A new golden age of computer architecture has come , So , They are 2019 Year of ACM The journal of Published an article devoted to .
In order to break the shackles of current architecture development , The answer they give is , Need hardware and software co design and innovation , Build domain specific architecture 、 Domain specific language , So as to build more professional hardware , And increase the running speed .
As an important part of driving the innovation of computer architecture , The compiler is also ushering in its golden age . Last year, 4 month 19 Japanese ASPLOS The meeting , Compiler Daniel Chris Lattner In the keynote speech, I shared the development status and future of compiler 、 programing language 、 Accelerators and Moore's law failure theory , And discuss how people in the industry can cooperate in innovation , Promote the development of the industry , Achieve a significant increase in the running speed of the processor .OneFlow The community compiled the content of its speech without changing its original meaning , I hope to be right AI/ Inspired by the compiler community .
Chris Lattner Graduated from the Department of computer science at the University of Portland , Experience in creating and leading several well-known large-scale projects , These include LLVM、Clang、MLIR and CIRCT Compiler infrastructure projects , He also led the creation of Swift programing language .
from 2005 year 7 Month to 2017 year 1 during one month , He once led Apple's Developer Tools Department , And then , Briefly led Tesla's autopilot team .2017 year 8 month ,Chris Lattner stay Google Brain The team led TensorFlow Infrastructure work , Includes a range of hardware support (CPU、GPU、TPU), The underlying runtime and programming language work .
2020 year 1 Month to 2022 year 1 month ,Chris Lattner stay SiFive The company leads the engineering and product team ( Including hardware 、 Software and platform engineering ),SiFive Based on open source instruction set RISC-V, Provide... To chip design companies IP. last year 6 month ,SiFive Received Intel's acquisition intention , The latter proposes to exceed 20 Buy the company for $100 million .2022 year 1 month ,Chris Lattner and Tim Davis Jointly established Modular AI, He himself acted as CEO, The goal is to rebuild the world ML infrastructure .
Here are Chris Lattner Content of the speech .
1
Why the need for next-generation compilers and programming languages
Although hardware is booming , New accelerators and new technologies are emerging , But it's hard for the software industry to really use them .
Why is that? ? In the world of accelerators , such as AI And the development of structured computing technology , There are many levels of acceleration, such as scalar acceleration and vector acceleration , It's like CPU The field is also divided into scalar processor and vector processor , Of course, there are still multicore CPU. In this way, there will be a variety of hardware combinations , Different hardware is installed in the same data center , Then these hardware must communicate with each other .
however , Many times there is no consistent memory , Cause to write a C It's not feasible to use language programs to run everything , This combination runs a bit like a supercomputer using multiple computers CPU equally .
meanwhile , The world is becoming more and more heterogeneous , There are a variety of applications . Machine learning is developing rapidly , But machine learning involves many technologies , If you don't just study training and reasoning , I also want to study reinforcement learning , Then you need to understand different accelerators . If you want to study reinforcement learning , We need to integrate host computing and accelerator Computing , Let them work together . Many of the new equipment manufactured now IP And hardware blocks are configurable , Even if it's as simple as changing the cache size with the depth of the memory hierarchy , Will affect the kernel on which these devices depend .
therefore , Although the hardware is more and more diverse now , The hardware ecosystem is growing rapidly , But it's still hard for software to make full use of them to improve performance . And if the software and hardware coordination is not in place , Performance will be greatly affected , That's not just 10% Left and right floats , such as , If you get the memory hierarchy wrong , Performance is likely to fall precipitously , Become one tenth of the normal level .
Today, , The field of accelerators has exploded , Almost every day, new companies make new accelerators . But the problem is , How to use this accelerator ? More to the point , Someone wants to make a new application , But they want to work on the software code base , So constantly promote and improve the software code base .
You can't use the old software stack directly on this new device , One of their parts may have changed supplier , Streamline the process , Resulting in different technology stacks required . therefore , You have to write a new software stack for every new small device . And doing so leads to the fragmentation of software , The development of this fragmentation has brought huge costs , It will also bite the hardware industry , Because the hardware doesn't work .
My point is this , We need next-generation compilers and programming languages to help solve this fragmentation . First , The computer industry needs better hardware abstraction , Hardware abstraction is a way to allow software innovation , There is no need to make every different device too specialized .
secondly , We need to support heterogeneous computing , Because you have to do matrix multiplication in a mixed computing matrix 、 decode JPEG、 Unstructured computing and so on . then , There is also a need to apply the language of the specialized field , And programming models that ordinary people can use .
Last , We also need high quality 、 High reliability and high scalability architecture . I like compilers very much , Many people are doing applications based on compilers , I also respect that . so to speak , They are developing the next generation of neural networks , Not just a compiler . You can cooperate , This means that they need available environments and available tools .
What's exciting is that , Compiler or programming language engineers will usher in a new era : Countless technologies were and are born , These technologies are changing the world , It is very exciting to have the honor to participate in this wave of change .
Next , I'll talk about the early development of the compiler industry , And the experience it brings us and its inspiration for the future .
2
Design and challenges of traditional compilers
When I was a student , The compiler is boxed separately , Install on a floppy disk , Insert the floppy disk into the computer every time you use it .
The industry situation at that time was , Different vendors make different processors 、 operating system , All want to stand out through innovation , Seize the value of the compiler . These compilers are dedicated , Not compatible with each other , Don't share code . So you'll see Borland C The compiler and Microsoft C Compilers compete with each other , Finally, it leads to fragmented Ecology . This hinders the development of the industry , But people haven't realized this yet .
The compiler consists of a front end 、 The optimizer and the back end make up , This fixed structure has been used for many years . If a company independently develops a compiler , The usual approach is to develop only one set of front-end and back-end , Instead of investing too much money in developing a variety of front-end and back-end . Other companies will do the same , As a result, the optimizers and front and back ends of different companies cannot be common .
GCC The compiler team was the first to break this pattern .GCC Through free software and open licenses , Allow mutual cooperation , This allows people to put the front end 、 Optimizer 、 The back end is designed separately , Realization “ Separation of concerns ”. And for this reason ,GCC With a variety of front-end and back-end .
In this way “ Separation of concerns ” It is not only conducive to the research and improvement of the compiler , It has also changed the industry pattern of compilers . because GCC There are the best C++ front end , Therefore, a large number of compiler engineers have improved the front-end code base , Promote innovation and C++ The development of . meanwhile , a large number CPU Companies can directly use GCC The front end of the , Just add your own back-end to participate in the market competition . therefore 20 century 90 S to 21 At the beginning of the century , The fragmentation of the whole industry is reduced . From then on ,GCC by C The development of language compilers paved the way , More new compilers emerge . This is a great success in the industry , Because it ignites the torch of innovation .
Following GCC Some new technologies have emerged after the innovation of , Including my own favorite LLVM, What I want to talk about is its modularity . It overturns the compiler's long-standing “ front end + Optimizer + Back end ” Three segment structure of ,LLVM A compiler is a series of libraries (library) The combination of . see LLVM The code base will find , All the code is there lib Under the table of contents .
These libraries can be carried out alone , Combine with other libraries 、 Use it with , It can also be reused . It can work with the movie special effects processing engine 、 Database and query engine ,LLVM Is an excellent C++ compiler , It can also play a wider role 、 More innovative uses .
LLVM Modularity highlights the importance of interfaces and components . since LLVM The birth of 20 these years , Without the front end 、 Optimizer 、 The division of the back end , It promotes the development of the compiler industry in an innovative way . For example, you can put XA6 Compiler or X86 The back end is used elsewhere , You can also focus on the world's top experts to study register allocators separately , Regardless of other compiler components , This not only improves the scalability , It also gave birth to new forms of cooperation .
LLVM One of the great advantages of is that it can gather more people's strength , Achieve more innovation . Many are based on LLVM A new front-end has been born , And it promotes Julia Language 、 Digital programming 、Rust and Swift Language 、 System programming 、 Security programming model, etc , It's exciting .LLVM Modularity of 、IR Independence 、 Low fragmentation has also spawned multilingualism .
Besides ,LLVM Also let JIT compile ( Just in time compilation ) Can do more . although JIT Compiler is already a famous Technology , But it was originally used elsewhere . With LLVM in the future , chip design 、HLS Tools 、 Graph processing 、 Are more convenient , It also promotes CUDA and GPGPU The birth of , These are great achievements .
But more important ,LLVM Integrated fragmentation .LLVM There were many kinds before JIT Compiler framework , but LLVM The existence of , Promoted JIT Compiler baseline , Let it burst out more possibilities , It also enables the industry to achieve higher-level innovation .
Come back ,LLVM Although there are many advantages , But it also has many problems . In limine LLVM My goal is to be a “ universal ” Solutions for , But it didn't seem to do anything well . Many people keep giving LLVM Add some “ at sixes and sevens ” Things that are , You can be right CPU and GPU You can do that , But not for accelerators . This kind of random doing “ Add ” It leads to the failure to use LLVM Do parallel processing optimization . But I like LLVM The point is that , It's a good CPU Back end , Although it doesn't meet other needs well .
Now we come to “ The expiration date of Moore's law ”, We can carry forward LLVM As CPU The advantages of the back end , But if you want to explore CPU Other functions , We should focus on LLVM IR outside .
3
Build an architecture suitable for the specific domain
Patterson and Hennessey Put forward a conclusion : We have come to the Renaissance of computer architecture , We need to vertically integrate the upstream and downstream personnel of the computer industry , Everyone should understand the hardware , You also need to understand software . They say , Because the world changes so fast , So when thinking about problems, we should return to the first principle , Look at the old things again with new eyes .
Next, I'll talk about the construction process of the accelerator , The possible evolution of accelerator in the future is discussed in combination with experience .
If you look at the hardware field , You will find that specialized architecture has become a trend , A series of specialized categories . About this topic , I recommend watching Mike Urbach Speech . If you put CPU As a general-purpose processor , So when you want to control all the doors (gate) when , You need deeper specialization and more hard coding capabilities .
So on the one hand CPU It's universal , Not as specialized as the matrix Accelerator . And then came GPGPU, Very flexible , It's also powerful , But be right GPU Programming is not that easy . Then there is the acceleration of machine learning TPU, Can do large matrix multiplication and direct convolution and other operations . These are various programmable “xPU”, In addition to that FPGA( Field programmable gate array ) And other fixed function hardware , You can reconfigure block The line between ; If further subdivided, there are ASIC, That is, integrated circuits can be specially designed according to specific needs .
In general, these two categories , One is general 、 programmable , The other is the one with fixed functions . Whenever I think about domain specific architecture , These two categories come to mind .
The above figure lists some companies that are making the above hardware ( Incomplete statistics ), You can see that many of them are industry giants . Every company thinks when studying : How to program it ? And each company will give different answers . such as Google Doing it XLA and TensorFlow,NVIDIA Doing it CUDA,Intel Doing it oneAPI, There are many hardware companies making their own hardware design kits and so on .
The problem is , Each tool addresses a different problem , They don't work together , Not compatible . Because they are independently developed by the small team of each company , Not much shared code , And every company is tirelessly adding new features to their tools , Every tool changes rapidly , Cause a relatively chaotic overall situation . These tools are the basic components of the industry , But there are so many different characteristics , What should we do from the industry level ?
In fact, the problems encountered by today's accelerator ,90 s C Language compilers have also encountered . As people often say , History is a cycle . We have witnessed an explosion of diversity in hardware and software , But if you want to continue to develop , This diversity will become an obstacle .
So we need to unify , Need something similar GCC and LLVM Such things , Otherwise, you have to be busy developing a dedicated backend for each specific device , There's no time for front-end 、 The innovation of programming language and model .
There are many elite talents in the industry , But not enough . If we can reduce fragmentation , Integrate the industry , Can promote innovation , Speed up the development of the industry , Continue to build the technology stack , Make the most of hardware , And take advantage of heterogeneous computing in a new way .
Next, let's talk about my views on the development of accelerators , And possible challenges in the development process .
4
The essence and evolution of accelerator
What is an accelerator ? It can be highly simplified into two parts , The first is the parallel computing unit . Because the structure of silicon itself is also parallel , Accelerators use many transistors , This requires a lot of silicon to achieve this parallel processing capability .
The second part plays a controlling role . Its name is not very uniform , It's called “ Control processor (Control Processor)”, It's called “ Sequencer (Sequencer)”. Some people want it to be smaller , So I'll make a state machine and embed it in registers . This part basically plays the role of arranging parallel computing units . If the parallel computing unit is a large matrix multiplication unit , The control processor will command it to perform some macro operations , For example, load... From this memory area 、 Perform an operation 、 Perform another operation 、 to update SRAM etc. .
Some accelerators are very different , Therefore, the ratio between control logic and calculation is also different . just as Patterson and Hennessy That's what I said , You can choose different points , But each point needs a certain degree of arrangement .
But people often forget about other related work , such as , You need more than just choreography , Also solve the startup problem , Such as power management , And constantly debug and debug . If you want to be perfect , These components can be programmed ; If you want to be simple , You can make these parts very small .
When the control processor and parallel computing unit are ready , How to input and output information to them ? At this time, you need a memory interface . Depending on the level of abstraction , This memory interface can be small block, It can also be a chip supporting the Internet of things , In this way, the accelerator can connect the whole network communication with the chip . You need something like AMBA Or similar technology .
You can be in a larger granularity (granularity) Build the entire ASIC, be-all ASIC It's accelerating , under these circumstances , You may be working with PCI signal communication , And is directly accessing the memory outside the chip , But this “ I have a control processor , There is a computing unit and a memory interface ” Model of , Is a very standard way to build these things .
Once these structural problems are solved , The architects began to show their strength , But they often forget that they also need software personnel to participate .
Ideally , They will start with the most basic problems , But in the end, the software looks like it's done at several different levels . The highest level is to consider the user experience , How to use ? How to build an application around it ? The lowest level is to consider controlling the operation of the processor , So at least one assembler is needed to complete the control process to be processed .
Then write a driver running on some kind of host processor to control this thing , Control it on and off , Loading , Upload the program to the control processor . Then there is some work running on these control processors , So these are often referred to as kernels . This model is very general , But the end result is that the hardware becomes more complex . So the first generation coprocessor (first generation co-processor) It could be very simple , But then someone came up with this wonderful idea : We want to achieve more .
under these circumstances , We want to reduce the area to accelerate , Want to do more AI、 Physics 、5G Or anything worth accelerating in areas such as bitcoin . The end result requires more control processors , Because of problems such as the speed of light and line delay, it is impossible to coordinate all transistors on a huge chip with only one control processor , So you need multiple control processors to process in parallel .
Fortunately, , It's easy to put into your model , Because you only need to parallelize these device cores 、 Multithreading or doing some deployment (tiling), Just make some simple changes . then , The kernels running on these hazel units can work together to solve problems , Decompose any problem in space , Then parallel processing .
now , Things get more complicated . When building a project like GPU A special accelerator , For example, to assemble billions of transistors into a complete cross chip . Chips of this size will cause many problems .
First , You finally want to have multi-level tiles , So I don't just want to have 492 A nuclear , You can GPU There is an array on the , Or have different SMS Or something like that . Or there will be heterogeneity involved , So there will be different kinds of accelerators on the same physical chip , Because I may be doing AI, But you need to be able to decode some JPEG. therefore , If I'm going to reason on the camera , It is necessary to decode the sensor data of the camera , This will get a new accelerator block, They are hard coded for different operations , And it's all mixed up .
Then they need to communicate with each other through the memory interface , This requires programming , And become more complex . Now there is a sudden need for this middle tier Technology , When processing multiple data streams on an accelerator , Not just different units on the accelerator tile There is parallelism on . Because there are many different operations running at the same time , So we should solve the problem of workload balance .
Besides , Also solve the problem of communication optimization , The speed of light is a pain point , Because it takes time to transfer data from one end of the chip to the other . But you don't want to be idle for a while , But want to carry out another communication process while communicating , Or calculate while communicating .
You want to be able to run like TensorFlow The same thing , Now you may have one XA6 Post Processors , So I want to go back from the accelerator to the host processor . Because you're doing file system operations or other very strange things , We must be able to coordinate this , All of a sudden , This layer of software began to become very large , And it's quite complicated .
This has been confirmed in many cases , Many accelerators have experienced this . One problem is , They started by hand kernel, These things have many different evolutionary steps , From the products of these hardware suppliers, we can see : as time goes on , Their hardware is evolving , Becoming more and more common , Software stacks and supported functions are also evolving .
therefore kernel The advantages of , It's the easiest way to start . A hardware person is paired with a firmware person , You can clearly know the function of hardware . Software personnel and hardware personnel work closely through collaborative design , Let your matrix multiply in AI Running very fast on the workload . It's very abstract , It's easy to handle .
The problem is , This doesn't really scale up . So we can see that , Now the workload you want to run on the accelerator is not just matrix multiplication , They want to run hundreds of different kernel operations on these things , Covers everything from convolution and matrix multiplication to reshaping ( Mainly memory operation ) To operate between elements (element-wise operations), Then to all kinds of strange operations , such as Top K And sort , Then to the very common new generation of things that study sparse algorithms and other emerging different applications .
The problems that follow are , On the one hand, you have these running kernel, On the other hand , You have unlimited versatility of hardware . therefore , In a vendor's hardware , Maybe you can fix it and look at the next generation of Technology .
You just need to write a hundred or a thousand units of kernel That's it . Maybe it's okay , But when you launch a second-generation device , May have changed the memory hierarchy , Added some new instructions to the control processor , Added optional functions , Or you decide to do kernel The fusion , I want to perform inter element operation on convolution , Then you have one n The difference of power kernel The combination of needs to be well integrated . Even if you have thousands of software engineers , You can't write all the kernel, Because you want your hardware team to move forward quickly .
I've seen this several times , Eventually people began to write kernel, But then they wrote a Python Program to generate kernel, these Python In a sense, a program is like a micro compiler .
If you continue to do so , These complexities will add up , Finally, this compiler layer is formed , It can be formed through powerful compiler engineering . This is theoretically possible , as time goes on , It can be formed gradually through natural evolution , Just like human beings from crawling to walking , This is the real situation I have seen in practice . Everyone has the opportunity to be the creator of this process , There is still much room for progress in this regard .
When you're building something real , It's actually very difficult . It feels easy at first , Because you can build a small control processor and a small accelerator , Make some software run fast , It's very simple in this case . But when you walk along “ This road ” Continue to go , Difficulties will slowly emerge , actually , It won't feel particularly difficult until you encounter expansion problems , But you don't want to change direction .
Besides , As we said before , The product quality is not always good . Now? , People create amazing products , And I have always been surprised by the continuous innovation in this industry . But we've also seen some compilers crash , For example, the bad news in the technology stack .
It makes sense , People don't always invest in this . Although I can understand this practice , But this hinders the development of the industry , This makes it more difficult to use these tools . therefore , Reduce the number of people in the community who are willing to tolerate and use these tools .
I think another real problem is , Most of the complexity really has nothing to do with solving accelerator problems . If I want to build a 5G Network accelerator , You need to consider 5G、FTS、 The parallelism inherent in the problem and how to use them . If we want to consider the workload of artificial intelligence machine learning , What I should consider is arithmetic operation and the correct ratio of calculation to memory, etc . But on the contrary , We usually need to invest a lot of time on things unrelated to these important issues , End with complexity .
If you erase something important about the accelerator , All that's left is the kernel drivers and assemblers that control the processor, and complex multi stream management groups like all these , How to use all the... On the accelerator tiles.
This is not where we want to spend our time , Spend time in programming mode 、 Innovation in hardware and other aspects , But this fragmentation is the real obstacle to the development of the industry .
therefore , My proposition is to innovate the programming model , Develop new applications , Push the industry forward through innovation . We should standardize everything needed for this process , Work can be done quickly through standardization , Then you can spend your time on what's really important .
that , How to do this job well ? Fortunately, , The industry has begun to standardize all the interface buses we need . If you and what you know SoC Structural connections , Usually use AMBA or CHAI Or something like that . If you want to connect to memory , Then you have to use DDR or HBM Such things . If you want to create a plug-in card in the system , Use something like PCI Express Such things . There are some new standards , Such as CXL Defines a new way , Can be PCI Universalization , And allow the use of new accelerators in larger systems , But we need to go further .
that , The control processor ? It should be noted that , When we look at the accelerator , Developing software that runs on accelerators ultimately costs more than building hardware . Besides, at this point , Hardware is better known . Different hardware has different configurations , But building software is an unsolved problem .
The control processor is also at the bottom of the stack , So when I talk about these subtle pitfalls in system design , Things seem easy , But going further will find them difficult . The control processor is one of these situations , At first , What you're thinking about is using a small state machine to control it , So I will write a compiler in the spreadsheet program .
Sometimes be aware of the need for power management , And security , The difficult parts of these things need to be built and coordinated , Really change the way they eventually work . If the person who built the control processor didn't build the compiler at the same time , Then they won't feel the pain of building software , And software is ultimately the more difficult part .
Patterson and Hennessey They talked about this in their speech , They come from 60 In the s, we began to observe the huge fragmentation of the industry .IBM The final solution to this problem is to standardize the instruction set , The choice is IBM 360 Instruction set , Still in use today . This is an amazing feat .
therefore , We have to make a choice , For example, whether we should standardize these control processes . We will use IBM 360 Do you , Or we need to use something new ?
In my submission , We should use something new , There is an instruction set technology called RISC-V, It is CPU An open industry standard . I like RISC-V The reason is that , It is a modular instruction set , It's like LLVM It's also modular 、 Library Based . If you don't want to use floating point numbers , It allows you to subset different parts of the instruction set .
however , If you don't want integer multiplication , You can also remove it . About RISC-V The greatness of is , It not only provides an instruction set standard , It also provides the whole software world running on it . therefore , You can get one C Language compiler , obtain Linux, Get all around RISC-V These things .
image SiFive Such a company , It makes a lot of different RISC-V processor . You can get many different perspectives in the field of design , Implement the specification with different trade-offs . therefore , If you're building a very simple Accelerator , Such as matrix multiplication or convolution Accelerator , There can be a very small RISC-V Nuclear to control a large hard coded Accelerator block.
On the other hand , If you want more programmability , You can change the proportion of silicon spent on control and parallel processing , And more control logic , So as to achieve more programmability and flexibility , You can adjust the ratio .
Or vice versa , Parallel units are part of the processor , When using this processor , A heterogeneous computing accelerator is built into the processor .
Or vice versa , You can put every one of this accelerator tile As a big CPU, Doing so will get something like Graviton Such a cloud Accelerator , for example , There's a pile of different tiles CPU, Versatility and acceleration can be handled in one instruction set , This allows to enhance the ecosystem of software .
You may be worried , If you want to build such a tiny control processor ,RISC-V How to solve this problem ? Obviously , The general solution is too big . There are some very small RISC-V The implementation of the , You can get open source, standardized RISC-V, There are about 15000 species gates The implementation of the , This is the beauty of the silicon industry . Because there are so many gates, You don't have to worry about spending too much on the control processor gates, Get the solution that best meets your needs .
Once you do that , It changes the way accelerators are built . You used to start by choosing a control processor , Then write an assembler or RISC-V Give an assembler . but RISC-V Not just an assembler , And a C Language compiler and a programmable IR.
therefore , The kernel can be generated for the control processor . Not only can you get C Language compiler , You can also get simulators and debuggers . I've never seen anything else that can be installed for simulators and chips GDB、LLDB The accelerator , This is not the technology that people usually invest in , Because it's disposable . however , By establishing and utilizing RISC-V Ecosystem of , You can invest and build the next level of technology again , To benefit from .
Once this is done , To the next level of complexity . After making something similar to the accelerator kernel compiler , You will encounter the following problems : How to carry out layered parallel computing ? There are many machines in a data center , There are multiple chips on the circuit board ? Each chip is in one ASIC There are dozens or hundreds of different accelerator units , How to program ?
Interestingly , Although all these compilers are different , But they have a lot in common . such as , All have memory hierarchies , There are many different granularity levels tiling, Need to be able to interact with them . therefore , Although these compilers are different , For example, one 5G The compiler of the base station should be compatible with AI Accelerators are different , But technical problems like tiling and memory hierarchy that need to be solved are the same .
There is a relatively new compiler technology MLIR It can help . You can take MLIR As a meta compiler , It allows you to build accelerators very quickly / compiler .MLIR The full name is “ Multi level intermediate representation ”, It supports building layered compilers , And build... In a way that applies to specific areas , While preserving the complexity of the field . then , Use MLIR Provide a large number of libraries and routines to do something , such as , Use polyhedron compiler to do loop expansion and loop fusion, etc .
therefore MLIR yes LLVM Part of the technology family , It inherited LLVM Design method and make LLVM Such a great idea , So with modularity 、 Extensibility , There is a great community of friendly people . In my submission ,LLVM One thing to appreciate about the community is :LLVM It's modular , There are quite good documents , Easy to learn , Suitable for studying .
I'm glad to see MLIR Appearance . Although it's only a few years old , But it has been widely used from CPU Code generation to GPU、 machine learning 、FPGAs And hardware , Besides , It is also used in quantum computing and the compiler itself MLIR Optimization mode application . stay MLIR There are many interesting things happening in this field .
MLIR Another advantage of , Directly in LLVM Layered on the basis of . It USES LLVM The library of , So you can do instant compilation , Write the kernel and compile it into LLVM IR And it's easy . Of course ,LLVM There are good ones, too RISC-V Code generation support . You can use a very simple 、 Build in a beautiful and composable way based on RISC-V The accelerator .
Now? , What we're starting to see is ,MLIR Start unifying the world of heterogeneous computing , That's what I want to see . All big companies are now using... To varying degrees MLIR, In my submission , Based on the RISC-V Above MLIR It is necessary to , Because once you start integrating the industry from bottom to top , You can start putting more and more layers (layer) Pull together , Reuse more technologies . This allows us to focus on the more interesting parts of the stack , Instead of reinventing the wheel over and over again .
What can we get from it ? If we can integrate the energy of scarce compilers and programming languages , Let these people work together , Then this industry can achieve more . If we do it again and again 、 Reinventing the wheel again and again , We'll pull each other .
As an industry , What we need is more innovation , More programming models , More technology and infrastructure , We really need to reduce the fragmentation of the industry , Improve the modularity of Other Unsolved things , Then focus on what's really important .
I've just been talking about accelerators , Talking about from CPU To TPU and GPU And so on “xPU”.
The hardware itself ? A large gray area is left on the right of the figure above , People working in this field are “ Hardware personnel ”, The people working in the left field are both hardware and software people , But on the right is a very different hardware world .
This is also something in parallel computing units . This is it. Patterson and Hennessey Talking about the architecture applicable to the specific field , And how to build these hardware blocks . We need algorithm innovation , Requires innovation in many different technologies , These need to be based on specific areas .
5
Compiler innovation opportunities
Maybe you won't be surprised , But I think the answer is compiler , This is the real way to go .
As a compiler programming language practitioner , I think the field of hardware design has reached the point of reassessment . The whole field is based on two technologies , But in fact, it is mainly called Verilog Technology , You probably don't like Verilog.
It has a very complex standard , When I look at it , I don't know if it's designed as a IR, That is, the intermediate representation between different tools , Or something designed for people to write directly . In my submission , It failed in both ways , It's really hard to use , It's also hard for tools to generate .
Besides ,EDA Tools 、 Hardware design tools are very mature , They are very standardized , Many large companies are promoting and developing these tools . But their innovation speed is not fast , Design does not focus on usability . They are much worse than accelerator compilers , Definitely not built on the best practices of software architecture , And the cost is very high . therefore , There are great opportunities for innovation in this field .
I'm not the first person to realize this . In the open source community , A bunch of tools have been built to promote the development of the industry . These tools are great , such as Verilator Widely used ,Yosys Is another great tool , It has a very good theorem prover (Theorem Prover).
My concern is , The ideal goal of these tools is to try as well as image proprietary tools , And I don't really think proprietary tools are that good . in addition , The designers of these tools did not cooperate . Every tool is following a single rigid approach , No large degree of modularity or reuse , You can get a list of networks from some of these tools , Use it to parse some Verilog Things like that . however , It is not built by a library based design , And LLVM Things like that are different .
The good news is , I see a full-blown outbreak of different developments taking place here , This is very relevant to the failure of Moore's law that we have been talking about . We see , The research team is promoting the production of new hardware design models , Yes Bluespec and Chisel Things like that . There are many new and different research groups exploring different hardware design methods , And they often end up with Verilog, It's really good , Because now you can introduce new type system methods from the software and hardware world 、 Programming language thinking 、 Compiler technology . actually , Software and hardware have many ideas that are interconnected .
Just soft 、 The hardware field speaks different languages in different ways . therefore , If the two sides can have more intersection , This is good for both industries , This cooperation is amazing , But they also encountered difficulties , This comes back to the problem :Verilog In fact, it's not a good IR.
To create grammatically correct , And can express what you want Verilog Very difficult . Besides , Because many are related to Verilog The relevant tools are a bit strange , And it's hard to predict with high quality . Generate tools compatible Verilog It is a black technology that every front-end tool must reinvent . therefore , A component is really missing from the stack , This component allows people to innovate at the programming model level , And allow people to find ways to get all tools to accept it .
One called CIRCT The new open source project is trying to solve this problem .CIRCT The full name is "Circuit IR for Compilers and Tools( Of compilers and tools Circuit IR)", It is built on MLIR and LLVM above .CIRCT The purpose of the community is to improve the whole hardware design world , Promote the innovation of programming model , And enable a new set of modular hardware design tools . It does use many of the library based technologies we've been discussing so far .
Besides , It provides a composable library based tool chain , Interesting new elastic interface connections can be established , You can build Chisel New programming models being explored by the community , Use it to accelerate Chisel technological process . It brings a lot of benefits , Can let a lot of people work together , Promote innovation in different ways . We are building a really great little world , Let people who care about hardware compilers work together , This is very interesting. . This work is still at an early stage , The goal is to build accelerators faster , Make the accelerator faster .
Our big goal is , The speed of hardware design and verification process should be improved 10 times . therefore , Building new hardware often ends up costing more to verify its correctness , This includes formal methods , It's equivalent to unit testing , There are many different ways to prove that what you are building is correct in all cases .
This correctness verification is more complex in the field of hardware than in the field of software , Because there is no real type system in the hardware field , There is no real multi-level IR, Therefore, it is not allowed to represent a state machine as a state machine , And write proof for it . Now? , What is happening is that the whole field is “ Removed the sugar (de-sugared)”, Become basically no type bits, Then all the analysis and tools work at this level , In my submission , We can quickly improve the whole field by building and introducing well-known technologies in the compiler and language community .
therefore , I hope we will be able to help cover the whole field of software and hardware , Open tools that combine these standards , Including... As an instruction set RISC-V, As a compiler stack MLIR, And as an application focusing on hardware CIRCT. We are trying to promote the faster development of the whole industry .
6
summary
Last , I want to say that now is indeed “ The golden age of compilers ”. In my submission , With the development of hardware and software, collaborative design becomes more and more important , We need to drive innovation faster than ever .
compiler 、 Programming languages and all the technologies , Including formal methods and type systems that promote linear types , And other fairly well understood systems , It will benefit the whole field . I think formalization 、 Engineering and cooperation between different parts of this field , Will push everything forward faster 、 Further more . I'm glad to see that many well-known methods and technologies in academia are landing .
People are trying to figure this out , They learn new things , But also somersault on some stupid questions . Now the situation is , We see that the pace of development has accelerated , See new innovations , New research on old things , Because people are going back to the first principle . I am very happy and excited to see what has happened .
( This article has been authorized to compile , Video link :https://www.youtube.com/watch?v=4HgShra-KnY)
Everyone else is watching
-
Two major Turing awards won the main works : The new golden age of computer architecture
-
OneFlow v0.7.0 Release : New distributed interface ,LiBai、Serving Everything
Welcome to download experience OneFlow v0.7.0 The latest version :https://github.com/Oneflow-Inc/oneflow/
This article is from WeChat official account. - OneFlow(OneFlowTechnology).
If there is any infringement , Please contact the [email protected] Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .
版权声明
本文为[ONEFLOW deep learning framework]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204221018263180.html
边栏推荐
- What is October 24th? Why are there no senior programmers in China in their fifties and sixties, while foreigners,,, Yu Nianyu Hui take you to celebrate the 1024 programmer Festival
- 直播软件|IPTV直播软件|电视直播|TVPlayer-IPTV-EasyPlayer|友窝直播|超级直播软件定制开发
- Yyds dry goods counting flag variable rule
- [蓝桥杯][2019年第十届真题]外卖店优先级
- [Ethernet switching security] - switch flow control / DHCP snooping / IP source guard
- Introduction to gbase8s SQL Engine framework
- Interview eight part essay (disorderly order, no classification)
- 京东一面:子线程如何获取父线程 ThreadLocal 的值?我蒙了。。。
- Thingskit Internet of things platform
- 计蒜客:数独(DFS)
猜你喜欢
gin -get请求的小示例2-Handle处理post请求
Examples of branch and loop statements
Detailed explanation of the usage of C language getchar
Is it difficult for girls to learn software testing?
Configuration of imx6ull bare metal development analysis and configuration process of elcdif lighting RGB LCD
Research and application of Acrel-5000 building energy consumption monitoring system in Xixian Airport Garden Project
Basic operation of Android local database | multi thread operation database | addition, deletion, modification and query of database | batch insertion into database | basic use of thread pool | Yu nia
Earth day collection: Microsoft and Intel invite you to get the green Ambassador badge and give you negative carbon emission!
[server data recovery] data recovery case of server crash after the hard disk of the server is flooded
世界读书日:18本豆瓣评分9.0以上的IT书值得收藏
随机推荐
Gbase 8s客户端与服务器的通信
Common problems and solutions of crashsight access reporting
Research and application of Acrel-5000 building energy consumption monitoring system in Xixian Airport Garden Project
光猫超级帐号密码,重置光猫获取超级帐号密码
Introduction to gbase 8s shared memory buffer pool
Gbase 8t index
VRF in Mina
Gbase 8s检查点简介
GPT general timer analysis and configuration process of imx6ull bare metal development
那些咸鱼上买来的代码|ssm酒店客房管理系统|买来的源码是否真的可以使用|来自程序员的困惑|玉念聿辉|大丑村吴明辉
Small example of gin - get request 1-handle handles get requests
After ten years of testing experience, I have sorted out the most suitable software testing learning guide for you
GBase8s SQL 引擎框架简介
Thingskit Internet of things platform
代码实现发邮件---sendemails
Hardware IIC analysis and configuration process of imx6ull bare metal development
计蒜客:数独(DFS)
Cloud native Virtualization: building edge computing instances based on kubevirt
DFS奇偶性剪枝
GBASE 8s分片表管理操作