当前位置:网站首页>Go compiler source code: syntax analysis
Go compiler source code: syntax analysis
2022-04-21 08:06:00 【BioIT】
A quick reading of this article requires a little knowledge of compilation principles 、Go Using experience of
Level co., LTD. 、 Error, please contact change
The last lexical analysis source code reading article
One 、 Syntax analysis
A lot of knowledge of grammar analysis 、 It's more like making a list 、 For everyone to check the information according to the list and learn
- The input of parsing is Token Sequence , namely Put a number of Token Combine to form a Node. And Node Combine in tree form , Form an abstract grammar tree (AST). After lexical analysis and grammatical analysis , Each file will form a tree AST
- Rule representation :
- Grammar : Wikipedia , If you don't understand, you need to read it carefully . The latter defaults to the basic knowledge of grammar
- We know the rules of lexical analysis ( Lexical analysis string production rules ) Regular grammar can be used to express 、 But parsing rules are obviously more complex . So we use context free grammar to express grammar rules 、 Context free grammars can contain regular grammars . That is, its ability is broader than regular grammar
- Context free means if a string of Token It's the same 、 So no matter where they are , Will use the same derivation rules to analyze , If context sensitive information , We have to rely on the following semantic analysis
- Writing / Represents context free grammar : Bakos paradigm and extended Bakos paradigm , The extended Bakos paradigm can use regular . Related Wikipedia
- Algorithm : With the expression of rules , You have to write code according to the rules to analyze Token Sequence to AST, Syntax analysis algorithms are divided into 2 Categories: : The top-down 、 Bottom up
- The top-down : The representative algorithms are LL(0),LL(k),LL English is Left-to-right Leftmost, Leftmost inference , Meaning is expanded from the leftmost non Terminator , Keep going like this .k The meaning of is how many to look forward Token Used to avoid useless branches ( A non terminator may have many derivation branches ) Traversal and backtracking , Speed up parsing
- For all branches Token First words 、 Then it's easy to handle . Watch one in advance Token By comparison, we can well judge which branches to take . But if the beginning of the branch is a non Terminator . We need to calculate these minutes First aggregate . When pre reading Token In a certain First When in a collection , To get into . Here is a temporary agreement : Capital letters represent non terminators , Lowercase letters represent terminators
- First aggregate :A -> Branch 1 | Branch 2 | Branch 3
- Branch to Token start , This branch First Set for this Token
- A branch begins with a non terminator , So this branch of First The set is the of this non terminator First aggregate
- Yes ε To deal with : If the first non terminator of the branch can be ε、 You have to look back and calculate one element together . If a branch, all elements can be ε、 be First Set contains ε. You need to introduce Follow aggregate
- A Of First aggregate , Is its branches First Union of sets
- Follow aggregate : For nonterminal symbols A,Follow(A) Express A What may happen later Token aggregate , The calculation rules are as follows :
- Start symbol S, FOLLOW(S) contain $
- B -> aAC This situation ,Follow(A) Include First ( Get rid of ε). Of course , If A Terminator followed , Directly to join Follow in
- B -> aA perhaps B -> aAC C It can be for ε, be Follow(A) Need to include Follow(B)
- Example :
// For the following production , seek 2 Species set
// A Is the beginning
A -> BC | aB
B -> -C | ε
C -> 12 | c | ε
// First aggregate
First(A) = First(B) Go to ε U First(C) Go to ε U ε U a = {
-, 12, c, ε, a}
First(B) = - U ε = {
-, ε}
First(C) = {
12, c, ε}
// Follow aggregate
Follow(A) = {
$}
Follow(B) = First(C) U Follow(A) = {
12, c, $}
Follow(C) = Follow(A) U Follow(B) = {
12, c, $}
// Our example is simple , No ring , If you encounter a ring, you should use the fixed point method to traverse and update multiple times , Until the set is stable
-
Revelation : If we want to use the forward-looking approach and cooperate with First、Follow The set accelerates the top-down analysis process , When designing grammar rules, we should pay attention to avoiding common prefixes if there are multiple branches . If there is a public prefix , It's best to extract it . Add a new syntax , This can be avoided k The value is very large to confirm which branch to take
-
Bottom up : Representative algorithm LR(k)、LALR(k) etc. . Harder to implement than the top-down algorithm , But there are more advantages , It naturally avoids left recursion ( See below ). because Go Using a top-down algorithm , Therefore, there is no more elaboration here , Readers can read the extended bottom-up algorithm here
- Implement the general top-down steps : Simple top-down implementation 、 This is one of its great advantages
- The code looks like it's for every left node in the grammar ( Non terminating node ) Define a function . It looks like a chain of recursive function calls ,and Logic will be called successively .or Logic will have if-else perhaps switch-case Code block handling
- Of course , We can use First/Follow Cooperate with looking forward to reduce the number of backtracking , Speed up parsing
- For example, implement the following syntax rules :BC and C yes or Relationship , The first branch B、C yes and Relationship
// A -> B C | C
// Probably code logic
func A() (aNode ANode) {
aNode.B = BOrNil()
aNode.C = C()
return
}
- The problem that the algorithm needs to solve :
- Priority questions : Use the superior subordinate relationship , Superior grammar has lower priority than subordinate grammar . The subordinate grammar will be parsed into child nodes , Will deal with... First . Have high priority
add -> mul | add + mul
mul -> pri | mul * pri
- Combination problem : This is not a separate problem of the top-down algorithm . The solution is to write the recursive term on the left , Right union writes the recursive term on the right , As above, we put add The recursive term is written on the left . If you write add -> mul | mul + add,1 + 2 + 3 It will be interpreted as 1 + (2 + 3)
- Left recursion problem : Because most operators are left bound , Writing the recursive term on the left will lead to infinite recursion in the top-down algorithm . The solution is to rewrite the grammar 、 Then optimize the code implementation
// If add -> mul | add + mul Without special treatment, there will be left recursion
func add() (addNode AddNode) {
addNode.X = mul()
// The discovery is not mul Branch . Then go add + mul Branch
if curToken == "+" {
// Backtrack and call add(), At the moment . We'll find that infinite calls
rewind()
addNode.X = add()
// Remaining codes ....
}
return
}
// Rewriting grammar :add -> mul (+ mul)*
// So we can keep trying to get + mul. At the same time, pay attention to the front addNode To be a new addNode The left node of . Otherwise, the combination
// There's a problem , The approximate code is as follows
while curToken == "+" {
nextToken() // consumption +
newAddNode = AddNode{
}
newAddNode.X = addNode
newAddNode.Y = mul()
addNode = newAddNode
}
- Most of the above problems appear in the analysis of expression nodes in language 、 Our handwriting is very inconvenient 、 But tools can help us
- Tool implementation : alike , The syntax analysis algorithm is relatively fixed , So there are many tools that can help us just write grammar rules , No code implementation . As mentioned in lexical analysis antlr
- antlr Written in a format file Go Parsing context free grammar
- antlr In, we only need to write when dealing with priority, and put the branches with high priority in front , Instead of breaking them down into separate grammars , Such as addition and multiplication :
expr : primaryExpr
| expr '*' expr
| expr '+' expr
- antlr Deal with the combination problem : The left union operator does not need to handle , The operator of the right set is specially specified
- antlr Obviously, it also helps us deal with the left recursion problem , There is no need for our special treatment
Two 、Go Syntax analysis, source code analysis :
Go Language syntax analysis related code in src\cmd\compile\internal\syntax In bag
When reading the source code , We can ignore error handling and Pos( Structure of location , Carry information such as ranks ) Handle 、 as well as debug and trace、 Special notes (//go: // line) Processing and other related logic . Improve your reading speed
When reading the source code of syntax analysis , Be sure to use it in combination with your own experience . Even some allow grammar that they have never written
- part Go Grammar rules and AST Structure of nodes :
- Show only some grammar rules : stay parse.go There are specific rules at the beginning of each function in the file , Readers can check all production rules . And you can refer to the above antlr Of Go Parsing rule file . Basically no big difference
// AST Root node
SourceFile = PackageClause ";" {
ImportDecl ";" } {
TopLevelDecl ";" } .
// Package definition
PackageClause = _Package identifier;
// Package introduction rules
ImportSpec = [ "." | PackageName ] ImportPath .
ImportPath = string_lit .
// File Include definition rules : Include constants 、 Variable 、 type 、 Method 、 Function definition . It can be verified in combination with the code you usually write
TopLevelDecl = Declaration | FunctionDecl | MethodDecl .
Declaration = ConstDecl | TypeDecl | VarDecl .
ConstSpec = IdentifierList [ [ Type ] "=" ExpressionList ] .
FunctionDecl = "func" FunctionName [ TypeParams ] ( Function | Signature ) .
MethodDecl = "func" Receiver MethodName ( Function | Signature ) .
- When the syntax parsing is complete , Each source file will be parsed into the following structure , This is one AST, It can also be understood as the root node of the tree
type File struct {
Pragma Pragma
PkgName *Name // Package name
DeclList []Decl // List of child nodes . Various definitions
EOF Pos
node
}
- Go Medium statement: Here directly copy others according to go Summary of grammar rules antlr Format statement The rules . function 、for In the main body of such structure , Is the stmtList Composed of
statement:
declaration // stmt It also contains constants 、 Variables, etc
| labeledStmt // label stmt.goto And other statements will jump to the corresponding tag to execute
| simpleStmt
| goStmt
| returnStmt
| breakStmt
| continueStmt
| gotoStmt
| fallthroughStmt
| block
| ifStmt
| switchStmt
| selectStmt
| forStmt
| deferStmt;
simpleStmt:
sendStmt
| incDecStmt // ++ -- sentence . In fact, it is similar to assignment、shortVarDecl Share a type of AST Node. Distinguish according to the attributes
| assignment
| expressionStmt // No subdivided expression Stmt
| shortVarDecl;
- Parser structure : It can be seen that the inheritance lexical analyzer , Call lexical analyzer next Function to get the current token
type parser struct {
file *PosBase
// Error handling functions , You can see that it is frequently called during lexical analysis ,errh If nil
// If an error is reported 1. errh == nil . The first error is returned and the syntax tree is nil 2. errh != nill Record the first error , Every time you encounter an attempt to continue parsing , That is, try your best
errh ErrorHandler
mode Mode // Whether to turn on switch functions such as generics
pragh PragmaHandler // directives Processing function , namely //go: These processing functions
scanner // Inherit lexical analyzer
base *PosBase // current position base
first error // first error encountered The first error encountered
// Error count
errcnt int // number of errors encountered
pragma Pragma // pragmas pragh The result after processing
fnest int // function nesting level (for error handling)
// Expression nesting level , You can see that every time you call the expression to process the related function ++.
// If readers read carefully, they can find a place xnest Is set to -1(for if switch The head part of the expression processing ):eg if a > 0 {} here a>0 Is the head expression
// If it is less than 0. Partial complexity is not allowed pexpr, Such as the initialization of the structure
xnest int // expression nesting level (for complit ambiguity resolution)
indent []byte // tracing support
}
- Go Parsing important documents :
- syntax.go: Syntax analysis entry 、 Two functions are provided 、 Provide the opened file handle or file name to convert the file source code into AST, There is also initialization parser logic . From this, we can see that the starting function of syntax analysis is fileOrNil().syntax.go There is also a Mode Of type. by uint Alias , Its main function is to control generic switches
- pos.go: Location structure , For error handling 、AST Node location information
- node.go:AST Structured representation of various types of nodes , That is, the data structure of various types of nodes
- parse.go: Syntax analysis core file , Realized Go The main logic of the parser is ,Go The syntax analysis of is a top-down algorithm , And use advance view Token Speed up analysis . The analysis start function is fileOrNil()
- Several frequently used functions and concepts that need to be understood :
- got(tok token) bool: Judge the present token Whether it is the expectation token. In fact, it's what we call Look ahead
func (p *parser) got(tok token) bool {
if p.tok == tok {
p.next()
return true
}
return false
}
- want(tok token): At present token It is not expected to report an error
func (p *parser) want(tok token) {
if !p.got(tok) {
p.syntaxError("expecting " + tokstring(tok))
p.advance()
}
}
- list(sep, close token, f func() bool) Pos: Analyze the incoming sep For division , until close Similar list so far . The function that parses each element is passed in f. This method has been changed before writing the article . But the source code is updated frequently . It doesn't matter much if there are no major changes , There's no need to keep up with the latest code reading
- group: The concept of group , Declare multiple elements with one keyword , These elements belong to a group : These are import Just belong to a group
- OrNil At the end of the function , Here, if the parsing fails , Will return nil
import (
"fmt"
"io"
"strconv"
"strings"
)
- A brief analysis of the important functions of grammar analysis : In fact, the functions implemented by the top-down method are very simple to read . We just need to combine grammar rules to verify each other's reading , Grammar analysis has 2000 It's impossible to list all the rows , Reading can focus on :fileOrNil、 Type processing (typeOrNil)、 Expression processing (expr)、stmt Handle ( stmtOrNil)
- fileOrNil: Parse a source file into a AST
func (p *parser) fileOrNil() *File {
if trace {
defer p.trace("file")()
}
f := new(File) // establish AST The root
f.pos = p.pos()
// 1. PackageClause
if !p.got(_Package) {
p.syntaxError("package statement must be first")
return nil
}
f.Pragma = p.takePragma()
f.PkgName = p.name() // package packageName
p.want(_Semi) // It can be seen that package packageName There must be a newline or multiline comment after
// Here's a summary token hinder _Semi Under what circumstances
// (1) The source code itself is written ;
// (2) Previous token need ;. Follow the previous one token After that is a multi line comment across multiple lines / Line break / Multiline comments and no content after multiline comments
// don't bother continuing if package clause has errors
// package If there is a problem with part of the analysis, there is no need to continue
if p.first != nil {
return nil
}
// 2. { ImportDecl ";" }. If something goes wrong ,d.Path.Bad There will be records
for p.got(_Import) {
f.DeclList = p.appendGroup(f.DeclList, p.importDecl)
p.want(_Semi) // Note that the end of the sentence should have ; ps: A single statement is the introduction path followed by , Introducing a group is ) There's going to be
}
// 3. { TopLevelDecl ";" } The remaining four definitions
for p.tok != _EOF {
switch p.tok {
case _Const: // constant
p.next()
f.DeclList = p.appendGroup(f.DeclList, p.constDecl)
case _Type: // type
p.next()
f.DeclList = p.appendGroup(f.DeclList, p.typeDecl)
case _Var: // Variable
p.next()
f.DeclList = p.appendGroup(f.DeclList, p.varDecl)
case _Func: // Functions or methods
p.next()
if d := p.funcDeclOrNil(); d != nil {
f.DeclList = append(f.DeclList, d)
}
default:
if p.tok == _Lbrace && len(f.DeclList) > 0 && isEmptyFuncDecl(f.DeclList[len(f.DeclList)-1]) {
// opening { of function declaration on next line
p.syntaxError("unexpected semicolon or newline before {")
} else {
p.syntaxError("non-declaration statement outside function body")
}
p.advance(_Const, _Type, _Var, _Func)
continue
}
p.clearPragma()
if p.tok != _EOF && !p.got(_Semi) {
p.syntaxError("after top level declaration")
p.advance(_Const, _Type, _Var, _Func)
}
}
// p.tok == _EOF
p.clearPragma()
f.EOF = p.pos()
return f
}
- Look here again expr Processing correlation function : Be careful Go How to deal with the combination problem 、 Left recursion problem 、 Priority issues
func (p *parser) expr() Expr {
if trace {
defer p.trace("expr")()
}
// Pay attention to this 0, It allows us to complete all reading of a complete expression
return p.binaryExpr(nil, 0)
}
// Expression = UnaryExpr | Expression binary_op Expression .
// It can be seen that the expression production rules are unary expressions or connected by binary operators 2 Expression
func (p *parser) binaryExpr(x Expr, prec int) Expr {
// x != nil When represents the left... Of the binary operator Expr
// x by nil First take the branch of unary expression
if x == nil {
x = p.unaryExpr()
}
// If there's an operator 、 Then go Expression binary_op Expression Branch
// You can see at this time 、 Previously parsed UnaryExpr This is to the left of the binary operator Expression. Instead of backtracking and rematching , The left recursion problem is avoided
// When closely followed token Is an operator and the priority is greater than the priority of the current expression , Then the current expression and the following expression form a binary expression , This operation ensures priority issues
for (p.tok == _Operator || p.tok == _Star) && p.prec > prec {
t := new(Operation)
t.pos = p.pos()
t.Op = p.op
tprec := p.prec
p.next() // Consumption operator
t.X = x // The problem of combination is solved here . If it's an operator of the same level , The previous node is a new node
t.Y = p.binaryExpr(nil, tprec) // Recursively call . Pay attention to priority logic
x = t
}
return x
}
- The rest of the content can be read by readers themselves , The annotated code of the source code gives a link at the end
Links :
Syntax analysis source code annotation link
Go Programming language specification , From it, you can query grammar related knowledge
antlr Format Go The grammar rule file has been given in the text
版权声明
本文为[BioIT]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204210713024382.html
边栏推荐
- Hcip221 question bank
- 【以太网交换安全】--- 端口隔离运行原理及二层隔离三层通信实例配置讲解
- Apache solr 远程代码执行漏洞(CVE-2019-0193)复现
- Swift memory management
- You cannot set a form field before rendering a field associated with the value
- 作文以记之 ~ 克隆图
- Solution of losing Beijing time zone in window system
- loading加载和统一异常处理
- 图片素材 免费素材 图片素材网站 图片素材在哪里找 哪里有的图片素材下载 图片素材的用途 图片素材 产品图片素材网站 什么的素材可以 PPT素材
- Playwright, selenium, operation ifram element
猜你喜欢
随机推荐
localhost和127.0.0.1有什么区别?(转载)
ECS uses FRP to map Apache on the local (win10 / win11) intranet to the extranet
Unable to infer base url. This is common when using dynamic servlet registration or when the API is
Installer mongodb
三层交换机【Vlanif详解】开启OSPF与路由器互通【eNSP实现】
信息学奥赛一本通 1209:分数求和 | OpenJudge 1.13 12:分数求和
Mathematical experiment -- function drawing experiment
Php article keyword replacement class
DeprecationWarning: NewId() is deprecated
2022 examination question bank and simulation examination of special operation certificate for hoisting machinery command
Accidentally found a Tsinghua sister's database!
从源码角度剖析redis分布式锁
loading加载和统一异常处理
Web 开发相关库或者软件
Pycharm's latest method of importing PIL Library
set集合
Informatics Aosai yibentong 1209: score summation | openjudge 1.13 12: score summation
Ancient artifact VIM
Yolov5 5.0调用本地摄像头
386字典序排数,学会用递归的思想






![[PROJECT] small hat takeout (VI)](/img/66/3ed0e0596abbf92e9111a1a0b7108c.png)


