当前位置:网站首页>Introduction to protobuf
Introduction to protobuf
2022-04-23 08:23:00 【weixin_ forty-six million two hundred and seventy-two thousand 】
What is? Protobuf
Protobuf yes Protocol Buffers For short , It is Google A data description language developed by the company , Used to describe a lightweight and efficient structured data storage format , And in 2008 Open source in .Protobuf Can be used for structured data serialization , Or say serialize . Its design is very suitable for data carrier in network communication , Suitable for data storage or RPC Data exchange format , It serializes less data, plus K-V To store data , The version compatibility of messages is very strong , Can be used for communication protocols 、 Language independence in areas such as data storage 、 Platform independent 、 Extensible serialization structure data format . Developers can use Protobuf The attached tools generate code and realize the function of serializing structured data .
Protobuf The most basic unit of data in is message, Is similar to Go The existence of structure in language . stay message You can nest message Or other members of the underlying data type .
The tutorial will describe how to use protocol buffer Language constructs your protocol buffer data , Include .proto
The syntax of the file and how to pass .proto
File generation data access class . The tutorial uses proto3 Version of protocol buffer Language .
Definition Message
Let's start with a simple example , For example, you define a search request message, Each search request will contain a search string , Return the results on the page , And the size of the result set . stay .proto
As defined in the document :
-
syntax =
"proto3";
-
-
message SearchRequest {
-
string query =
1;
-
int32 page_number =
2;
-
int32 result_per_page =
3;
-
}
.proto
The first line of the file specifies the use ofproto3
grammar . If omitted protocol buffer Compiler default useproto2
grammar . It must be the first line of the non empty non comment line in the file .SearchRequest
Three fields are specified in the definition (name/value Key value pair ), Each field will have a name and type .
Specify field type
In the example above , All fields are two integers of scalar type (page_number and result_per_page) And a string (query). However, you can also specify compound types for fields , Including enumeration types and others message type
Specify the field number
stay message Each field in the definition has a unique number , These numbers are used to identify the fields you define in the binary message body , Once your message After the type is used, you should not modify these numbers . Note that the message Field number when encoding into binary message body 1-15 Will take up 1 Bytes ,16-2047 Will take up two bytes . So in some frequently used message in , You should always use the front first 1-15 Field number .
The minimum number you can specify is 1, The biggest is 2E29 - 1(536,870,911). among 19000 To 19999 It's for protocol buffers Realize the reserved field label , Definition message Can not be used when . Similarly, you can't reuse any current message The field number used and reserved in the definition .
Rules for defining fields
message The fields of must comply with the following rules :
- singular: One follows singular The fields of the rule , In a well structured message Message body ( Encoding message) There can be 0 or 1 This field ( But you can't have more than one ). This is a proto3 The default field rule for Syntax .( This is a bit obscure to understand , For example, the three fields in the above example are singular Type field , There can be... In the encoded message body 0 perhaps 1 individual query Field , But there won't be more .)
- repeated: follow repeated The field of the rule can have any number of values in the message , The order of these values can be maintained in the message weight ( Is the field of array type )
Add more message types
In a single .proto
Multiple... Can be defined in the file message, This is defining multiple related message It's very useful . for instance , We define SearchRequest
The corresponding response message SearchResponse
, Add it to the previous .proto
In file .
-
message
SearchRequest {
-
string query =
1;
-
int32 page_number =
2;
-
int32 result_per_page =
3;
-
}
-
-
message
SearchResponse {
-
...
-
}
Add notes
.proto
Notes and in the document C,C++ The annotation style is the same , Use // and / ... /
-
/* SearchRequest represents a search query, with pagination options to
-
* indicate which results to include in the response. */
-
-
message SearchRequest {
-
string query =
1;
-
int32 page_number =
2;
// Which page number do we want?
-
int32 result_per_page =
3;
// Number of results to return per page.
-
}
Keep field
When you delete or comment out message One of the fields , In the future, other developers will update message When defining, you can reuse the previous field number . If they accidentally load the old version .proto
Files can cause serious problems , For example, data corruption 、 Privacy disclosure, etc . One way to avoid problems is to specify reserved field numbers and field names . If someone uses these fields to identify in the future, it is protocol buffer The compiler will report an error .
-
message
Foo {
-
reserved
2,
15,
9 to
11;
-
reserved
"foo",
"bar";
-
}
proto What code will be generated
When using protocol buffer Compiler compilation .proto
When you file , The compiler will be based on your .proto
Defined in the file message Type generates code for the specified programming language . The generated code includes accessing and setting field values 、 format message Type to output stream , Parse out... From the input stream message etc. .
- For C++, the compiler generates a
.h
and.cc
file from each.proto
, with a class for each message type described in your file. - For Java, the compiler generates a
.java
file with a class for each message type, as well as a specialBuilder
classes for creating message class instances. - Python is a little different – the Python compiler generates a module with a static descriptor of each message type in your
.proto
, which is then used with a metaclass to create the necessary Python data access class at runtime. - For Go, the compiler generates a
.pb.go
file with a type for each message type in your file. - For Ruby, the compiler generates a
.rb
file with a Ruby module containing your message types. - For Objective-C, the compiler generates a
pbobjc.h
andpbobjc.m
file from each.proto
, with a class for each message type described in your file. - For C#, the compiler generates a
.cs
file from each.proto
, with a class for each message type described in your file. - For Dart, the compiler generates a
.pb.dart
file with a class for each message type in your file.
Scalar type
| .proto Type | Notes | C++ Type | Java Type | Python Type[2] | Go Type | Ruby Type | C# Type | PHP Type | Dart Type | | :---------- | :----------------------------------------------------------- | :------- | :--------- | :------------- | :------ | :----------------------------- | :--------- | :---------------- | :-------- | | double | | double | double | float | float64 | Float | double | float | double | | float | | float | float | float | float32 | Float | float | float | double | | int32 | Use variable length encoding . Coding negative numbers is inefficient - If your field may have negative values , Please switch to sint32. | int32 | int | int | int32 | Fixnum or Bignum (as required) | int | integer | int | | int64 | Use variable length encoding . Coding negative numbers is inefficient - If your field may have negative values , Please switch to sint64. | int64 | long | int/long[3] | int64 | Bignum | long | integer/string[5] | Int64 | | uint32 | Use variable length encoding | uint32 | int | int/long | uint32 | Fixnum or Bignum (as required) | uint | integer | int | | uint64 | Use variable length encoding . | uint64 | long | int/long | uint64 | Bignum | ulong | integer/string[5] | Int64 | | sint32 | Use variable length encoding . The signature of the int value . These are more than normal int32 Encode negative numbers more effectively . | int32 | int | int | int32 | Fixnum or Bignum (as required) | int | integer | int | | sint64 | Use variable length encoding . The signature of the int value . These are more than normal int64 Encode negative numbers more effectively . | int64 | long | int/long | int64 | Bignum | long | integer/string[5] | Int64 | | fixed32 | Always four bytes . If the value is usually greater than 228, More than uint32 More effective . | uint32 | int | int/long | uint32 | Fixnum or Bignum (as required) | uint | integer | int | | fixed64 | Always eight bytes . If the value is usually greater than 256, More than uint64 More effective | uint64 | long | int/long[3] | uint64 | Bignum | ulong | integer/string[5] | Int64 | | sfixed32 | Always four bytes | int32 | int | int | int32 | Fixnum or Bignum (as required) | int | integer | int | | sfixed64 | Always eight bytes | int64 | long | int/long | int64 | Bignum | long | integer/string[5] | Int64 | | bool | | bool | boolean | bool | bool | TrueClass/FalseClass | bool | boolean | bool | | string | The string must always contain UTF-8 Encoding or 7 position ASCII Text , And not more than 232. | string | String | str/unicode | string | String (UTF-8) | string | string | String | | bytes | It can contain no more than 232 Any sequence of bytes . | string | ByteString | str | []byte | String (ASCII-8BIT) | ByteString | string | List |
The default value is
At that time, a coded message There is no one in the body message In the definition singular A field , stay message In the object parsed into , The corresponding field will be set to message The default value of this field in the definition . The default value depends on the type :
- For strings , The default value is an empty string .
- For bytes , The default value is null bytes .
- about bools, The default value is false.
- For number types , The default value is zero .
- For enumeration , The default value is the first defined enumeration value , The value must be 0.
- For message fields , The field is not set . Its exact value depends on the language . For more information , See the code generation Guide .
Enumeration type
When defining message types , You might want one of the fields to have only one predefined value, the value in the list . for example , Let's say you're going to work for each SearchRequest
add to corpus
Field , among corpus
It can be UNIVERSAL,WEB,IMAGES,LOCAL,NEWS,PRODUCTS or VIDEO. You can do this very simply by adding enumerations to the message definition , And add constants for each possible enumeration value to achieve .
In the following example , We added a name Corpus
Enumerated type of , And a Corpus
Type field :
-
message SearchRequest {
-
string
query
=
1;
-
int32
page_number
=
2;
-
int32
result_per_page
=
3;
-
enum
Corpus {
-
UNIVERSAL =
0;
-
WEB =
1;
-
IMAGES =
2;
-
LOCAL =
3;
-
NEWS =
4;
-
PRODUCTS =
5;
-
VIDEO =
6;
-
}
-
Corpus
corpus
=
4;
-
}
As you can see ,Corpus
The first constant of the enumeration is mapped to 0: All enumeration definitions need to contain a constant mapping to 0 And as the first line of the definition , This is because :
- There has to be 0 value , So we can put 0 As the default value of enumeration .
- proto2 The enumeration value in the first line of syntax is always the default value , For compatibility 0 The value must be the first line of the definition .
Use other Message type
You can use other message Type as the type of the field , Suppose you want to be in every SearchResponse
The type carried in the message is Result
The news of ,
You can be in the same .proto
Define a Result
Message type , And then in SearchResponse
To designate a Result
Type field .
-
message SearchResponse {
-
repeated
Result
results
=
1;
-
}
-
-
message Result {
-
string
url
=
1;
-
string
title
=
2;
-
repeated
string
snippets
=
3;
-
}
Import message definition
In the example above ,Result
The message type is related to SearchResponse
Defined in the same file - If the message type to be used as the field type is already in another .proto
The document defines , What should I do ?
You can use other... By importing .proto The definition in the document . To import another .proto The definition of , Please add a... At the top of the file import sentence :
import "myproject/other_protos.proto";
By default , You can only use directly imported .proto
The definition in the document . however , Sometimes you may need to .proto
Move file to new location . Now? , You can put a virtual... In the old location .proto
file , Use... In documents import public
The syntax forwards all imports to a new location , Instead of moving directly .proto
File and update all call points in one change . Any import contains import public
Of the statement proto
Anyone who files can pass dependencies and import public dependencies . for example
-
// new.proto
-
// All definitions are moved here
-
// old.proto
-
// This is the proto that all clients are importing.
-
import
public
"new.proto";
-
import
"other.proto";
-
// client.proto
-
import
"old.proto";
-
// You use definitions from old.proto and new.proto, but not other.proto
The compiler will pass the command line arguments -I
perhaps --proto-path
Search in the folder specified in .proto
file , If no compiler is provided, it will search in the directory calling its compiler . Generally speaking, you should --proto-path
Set the value of to the root directory of your project , And use fully qualified names for all imports .
Use proto2 The message type of
You can import proto2 Version of the message type to proto3 Used in the message type of , Of course you can proto2 Import... In message type proto3 The message type of . however proto2 The enumeration type of cannot be applied directly to proto3 In the grammar of .
Nested message types
Message types can be defined and used in other message types , In the following example Result
Messages are defined in SearchResponse
In the news
-
message
SearchResponse {
-
message Result {
-
string url =
1;
-
string title =
2;
-
repeated string snippets =
3;
-
}
-
repeated
Result
results = 1;
-
}
If you want to use the child message defined in the parent message externally , Use Parent.Type
Quote them
-
message SomeOtherMessage {
-
SearchResponse.
Result result =
1;
-
}
You can nest any number of layers of messages
-
message Outer {
// Level 0
-
message MiddleAA {
// Level 1
-
message Inner {
// Level 2
-
int64 ival =
1;
-
bool booly =
2;
-
}
-
}
-
message MiddleBB {
// Level 1
-
message Inner {
// Level 2
-
int32 ival =
1;
-
bool booly =
2;
-
}
-
}
-
}
to update Message
If an existing message type no longer meets your current needs -- For example, you want to add an additional field to the message -- But I still want to use the code generated by the old version of the message format , Never mind ! Just remember the following rules , It's very simple to update the message definition without breaking the existing code .
- Do not change the field number of any saved fields .
- If you add a new field , Any message serialized by code generated by legacy message formats , It can still be parsed by the code generated according to the new message format . You should remember the default values of these elements, and the newly generated code can correctly interact with the messages created by the serialization of the old code . Allied , Messages created by new code can also be parsed by old code : Old news ( Binary system ) The newly added fields are simply ignored during parsing , Check out the unknown fields section below for more .
- As long as the field number is no longer reused in the updated message type , You can delete this field . You can also rename fields , For example, add
OBSOLETE_
Prefix or set the field number toreserved
, These future users will not accidentally reuse the field number .
Unknown field
Unknown field is well formed protocol buffer serialized data , Represents a field that is not recognized by the parser . for example , When the old binary parses the data sent by the new binary with new fields , These new fields will become unknown fields in the old binary .
first ,proto3 Messages always discard unknown fields during parsing , But in 3.5 In the version , We reintroduce the retention of unknown fields to match proto2 Behavior . In version 3.5 And later , Unknown fields are reserved during parsing , And included in the serialized output .
Mapping type
If you want to create a map as message Part of the definition ,protocol buffers Provides a simple and convenient Syntax
map<key_type, value_type> map_field = N;
key_type
It can be any integer or string ( Except floating point numbers and bytes All scalar types except ). Be careful enum
Not an effective one key_type
.value_type
Can be any type except ( intend protocol buffers Nested... Is not allowed in the message body of map).
for instance , If you want to create one called projects Mapping , every last Project
The message is associated with a string key , You can define it like this :
map<string, Project> projects = 3;
- The field in the mapping cannot be follow repeated Regular ( It means that the value of the field in the mapping cannot be an array ).
- The values in the map are out of order , So you can't rely on the order of elements in the map .
- Generate .proto Text format , Map key sorting . The number keys are sorted by number .
- When parsing or merging from lines , If there are duplicate mapping keys , Then use the last key you see . When parsing mapping from text format , If there are duplicate keys , Then parsing may fail .
- If no value is specified for the mapped field , The behavior when a field is serialized depends on the language . stay C++, Java and Python The default values of field types in are serialized as field values , Other languages do not .
to Message Add package name
You can .proto
Add an optional... To the file package
To prevent name conflicts before message types .
-
package foo.bar;
-
message Open { ... }
In defining message Use the fields as follows package name
-
message Foo {
-
...
-
foo.bar.Open
open =
1;
-
...
-
}
package The effect of symbols on generated code depends on the programming language
Defining services
If you want the message type to be the same as RPC( Remote procedure call ) Use the system together , You can .proto
Define a RPC Service interface , then protocol buffer The compiler will generate service interface code and... According to the programming language you choose stub, Add a service you want to define , One of its ways is to accept SearchRequest
Message return SearchResponse
news , You can .proto
It is defined in the file like the following example :
-
service
SearchService {
-
rpc Search (SearchRequest)
returns (SearchResponse);
-
}
And protocol buffer The simplest to use together RPC System is gRPC
: A kind of Google Developed language and platform neutral open source RPC System . gRPC
Especially for protocol buffer, Your special use of protocol buffer Compiler plug-ins directly from .proto
File generation related RPC Code .
If you don't want to use it gRPC
, You can use your own RPC System , More about implementing RPC The details of the system can be found in Proto2 Language Guide Find .
JSON codec
Proto3 Support JSON Specification code in , Makes it easier to share data between systems . The coding rules are listed type by type in the following table .
If JSON A value is missing from the encoded data , Or its value is null, Then it is resolved to protocol buffer when , It will be interpreted as the corresponding default value . If the field is in protocol buffer Has a default value in , By default, it will be in JSON Omit this field from the encoded data to save space . Writing a codec implementation can override this default behavior in JSON Option to keep fields with default values in encoded output .
| proto3 | JSON | JSON example | Notes | | :--------------------- | :------------ | :--------------------------------------- | :----------------------------------------------------------- | | message | object | {"fooBar": v, "g": null,…}
| Generate JSON object . The message field name will be converted to a small hump and become JSON Object key . If you specify json_name
Field options , The specified value is used as the key . The parser accepts the name of the hump ( Or by the json_name
The name specified by the option ) And primitive proto Field name . null
Is an acceptable value for all field types , And is regarded as the default value of the corresponding field type . | | enum | string | "FOO_BAR"
| Use proto The name of the enumeration value specified in . The parser accepts enumeration names and integer values . | | map | object | {"k": v, …}
| All keys will be converted to strings | | repeated V | array | [v, …]
| null Will be converted to an empty list [] | | bool | true, false | true, false
| | | string | string | "Hello World!"
| | | bytes | base64 string | "YWJjMTIzIT8kKiYoKSctPUB+"
| JSON The value will be standard with padding base64 Data encoded as a string . Accept with / Standard or without filling URL Security base64 code . | | int32, fixed32, uint32 | number | 1, -10, 0
| JSON value will be a decimal number. Either numbers or strings are accepted. | | int64, fixed64, uint64 | string | "1", "-10"
| JSON value will be a decimal string. Either numbers or strings are accepted. | | float, double | number | 1.1, -10.0, 0, "NaN","Infinity"
| JSON value will be a number or one of the special string values "NaN", "Infinity", and "-Infinity". Either numbers or strings are accepted. Exponent notation is also accepted. | | Any | object
| {"@type": "url", "f": v, … }
| If the Any contains a value that has a special JSON mapping, it will be converted as follows: {"@type": xxx, "value": yyy}
. Otherwise, the value will be converted into a JSON object, and the "@type"
field will be inserted to indicate the actual data type. | | Timestamp | string | "1972-01-01T10:00:20.021Z"
| Uses RFC 3339, where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. | | Duration | string | "1.000340012s", "1s"
| Generated output always contains 0, 3, 6, or 9 fractional digits, depending on required precision, followed by the suffix "s". Accepted are any fractional digits (also none) as long as they fit into nano-seconds precision and the suffix "s" is required. | | Struct | object
| { … }
| Any JSON object. See struct.proto
. | | Wrapper types | various types | 2, "2", "foo", true,"true", null, 0, …
| Wrappers use the same representation in JSON as the wrapped primitive type, except that null
is allowed and preserved during data conversion and transfer. | | FieldMask | string | "f.fooBar,h"
| See field_mask.proto
. | | ListValue | array | [foo, bar, …]
| | | Value | value | | Any JSON value | | NullValue | null | | JSON null | | Empty | object | {} | An empty JSON object |
The generated code
To generate Java,Python,C ++,Go,Ruby,Objective-C or C# Code , You need to use .proto
Message types defined in the file , You need to .proto
Up operation protocol buffer compiler protoc
. If the compiler is not already installed , Please download the package and follow README Operate according to the instructions in the document . about Go, You also need to install a special code generator plug-in for the compiler : You can GitHub Upper golang/protobuf Find the plug-in and installation instructions in the project .
The compiler evokes :
protoc --proto_path=IMPORT_PATH --cpp_out=DST_DIR --java_out=DST_DIR --python_out=DST_DIR --go_out=DST_DIR --ruby_out=DST_DIR --objc_out=DST_DIR --csharp_out=DST_DIR path/to/file.proto
IMPORT_PATH
Specifies when parsingimport
Where to search when ordering.proto
file , If omitted, it will be found in the current working directory , It can be passed multiple times--proto-path
Parameter to specify multiple import Catalog , They will be searched by the compiler in order .-I=IMPORT_PATH
yes--proto_path
Short form of .- You can provide one or more output commands :
--cpp_out
generates C++ code inDST_DIR
. See the C++ generated code reference for more.--java_out
generates Java code inDST_DIR
. See the Java generated code reference for more.--python_out
generates Python code inDST_DIR
. See the Python generated code reference for more.--go_out
generates Go code inDST_DIR
. See the Go generated code reference for more.--ruby_out
generates Ruby code inDST_DIR
. Ruby generated code reference is coming soon!--objc_out
generates Objective-C code inDST_DIR
. See the Objective-C generated code reference for more.--csharp_out
generates C# code inDST_DIR
. See the C# generated code reference for more.--php_out
generates PHP code inDST_DIR
. See the PHP generated code reference for more.- One or more... Must be provided .proto File as input . You can specify more than one at a time .proto file . Although the file is named relative to the current directory , But each file must exist in one of them IMPORT_PATH in , So that the compiler can determine its specification name .
版权声明
本文为[weixin_ forty-six million two hundred and seventy-two thousand ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230748008907.html
边栏推荐
- 通过实现参数解析器HandlerMethodArgumentResolver接口来自定义注解
- 5.6 comprehensive case - RTU-
- 剑指offer day24 数学(中等)
- My heart's broken! A woman's circle of friends envied others for paying wages on time and was fired. Even her colleagues who liked her were fired together
- [explanation] get ora-12838: cannot read / modify an object after modifying it in parallel
- Goland 调试go使用-大白记录
- MySQL数据库中delete、truncate、drop原理详解
- 监控智能回放是什么,如何使用智能回放查询录像
- ASAN 极简原理
- Listed on the Shenzhen Stock Exchange: the market value is 5.2 billion yuan. Lu is the East and his daughter is American
猜你喜欢
Why are there 1px problems? How?
LeetCode中等题之旋转函数
QT compilation qtxlsx Library
dried food! Point based: differentiable Poisson solver
ATSS(CVPR2020)
Green apple film and television system source code film and television aggregation film and television navigation film and television on demand website source code
Listed on the Shenzhen Stock Exchange: the market value is 5.2 billion yuan. Lu is the East and his daughter is American
LeetCode简单题之统计字符串中的元音子字符串
关于ORB——SLAM运行中关键帧位置越来越近的异常说明
How to read books and papers
随机推荐
Rotation function of leetcode medium problem
The annotation is self-defined by implementing the parameter parser handlermethodargumentresolver interface
对OutputStream类的flush()方法的误解
数据可视化:使用Excel制作雷达图
Qt读取路径下所有文件或指定类型文件(含递归、判断是否为空、创建路径)
colorui 解决底部导航遮挡内容问题
总线结构概述
JS converts tree structure data into one-dimensional array data
耳穴减肥自身感受细节描述0422
队列(c语言/链表)
stm32以及freertos 堆栈解析
Input / output system
【深度好文】Flink SQL流批⼀体化技术详解(一)
QFileDialog 选择多个文件或文件夹
ansible自動化運維詳解(一)ansible的安裝部署、參數使用、清單管理、配置文件參數及用戶級ansible操作環境構建
pdf加水印
Talk about the basic but not simple stock data
LeetCode-199-二叉树的右视图
Common regular expressions
rust 使用tokio的Notify 和timeout实现类似可超时条件变量的效果