当前位置:网站首页>Introduction to protobuf
Introduction to protobuf
2022-04-23 08:23:00 【weixin_ forty-six million two hundred and seventy-two thousand 】
What is? Protobuf
Protobuf yes Protocol Buffers For short , It is Google A data description language developed by the company , Used to describe a lightweight and efficient structured data storage format , And in 2008 Open source in .Protobuf Can be used for structured data serialization , Or say serialize . Its design is very suitable for data carrier in network communication , Suitable for data storage or RPC Data exchange format , It serializes less data, plus K-V To store data , The version compatibility of messages is very strong , Can be used for communication protocols 、 Language independence in areas such as data storage 、 Platform independent 、 Extensible serialization structure data format . Developers can use Protobuf The attached tools generate code and realize the function of serializing structured data .
Protobuf The most basic unit of data in is message, Is similar to Go The existence of structure in language . stay message You can nest message Or other members of the underlying data type .
The tutorial will describe how to use protocol buffer Language constructs your protocol buffer data , Include .proto The syntax of the file and how to pass .proto File generation data access class . The tutorial uses proto3 Version of protocol buffer Language .
Definition Message
Let's start with a simple example , For example, you define a search request message, Each search request will contain a search string , Return the results on the page , And the size of the result set . stay .proto As defined in the document :
-
syntax =
"proto3";
-
-
message SearchRequest {
-
string query =
1;
-
int32 page_number =
2;
-
int32 result_per_page =
3;
-
}
.protoThe first line of the file specifies the use ofproto3grammar . If omitted protocol buffer Compiler default useproto2grammar . It must be the first line of the non empty non comment line in the file .SearchRequestThree fields are specified in the definition (name/value Key value pair ), Each field will have a name and type .
Specify field type
In the example above , All fields are two integers of scalar type (page_number and result_per_page) And a string (query). However, you can also specify compound types for fields , Including enumeration types and others message type
Specify the field number
stay message Each field in the definition has a unique number , These numbers are used to identify the fields you define in the binary message body , Once your message After the type is used, you should not modify these numbers . Note that the message Field number when encoding into binary message body 1-15 Will take up 1 Bytes ,16-2047 Will take up two bytes . So in some frequently used message in , You should always use the front first 1-15 Field number .
The minimum number you can specify is 1, The biggest is 2E29 - 1(536,870,911). among 19000 To 19999 It's for protocol buffers Realize the reserved field label , Definition message Can not be used when . Similarly, you can't reuse any current message The field number used and reserved in the definition .
Rules for defining fields
message The fields of must comply with the following rules :
- singular: One follows singular The fields of the rule , In a well structured message Message body ( Encoding message) There can be 0 or 1 This field ( But you can't have more than one ). This is a proto3 The default field rule for Syntax .( This is a bit obscure to understand , For example, the three fields in the above example are singular Type field , There can be... In the encoded message body 0 perhaps 1 individual query Field , But there won't be more .)
- repeated: follow repeated The field of the rule can have any number of values in the message , The order of these values can be maintained in the message weight ( Is the field of array type )
Add more message types
In a single .proto Multiple... Can be defined in the file message, This is defining multiple related message It's very useful . for instance , We define SearchRequest The corresponding response message SearchResponse , Add it to the previous .proto In file .
-
message
SearchRequest {
-
string query =
1;
-
int32 page_number =
2;
-
int32 result_per_page =
3;
-
}
-
-
message
SearchResponse {
-
...
-
}
Add notes
.proto Notes and in the document C,C++ The annotation style is the same , Use // and / ... /
-
/* SearchRequest represents a search query, with pagination options to
-
* indicate which results to include in the response. */
-
-
message SearchRequest {
-
string query =
1;
-
int32 page_number =
2;
// Which page number do we want?
-
int32 result_per_page =
3;
// Number of results to return per page.
-
}
Keep field
When you delete or comment out message One of the fields , In the future, other developers will update message When defining, you can reuse the previous field number . If they accidentally load the old version .proto Files can cause serious problems , For example, data corruption 、 Privacy disclosure, etc . One way to avoid problems is to specify reserved field numbers and field names . If someone uses these fields to identify in the future, it is protocol buffer The compiler will report an error .
-
message
Foo {
-
reserved
2,
15,
9 to
11;
-
reserved
"foo",
"bar";
-
}
proto What code will be generated
When using protocol buffer Compiler compilation .proto When you file , The compiler will be based on your .proto Defined in the file message Type generates code for the specified programming language . The generated code includes accessing and setting field values 、 format message Type to output stream , Parse out... From the input stream message etc. .
- For C++, the compiler generates a
.hand.ccfile from each.proto, with a class for each message type described in your file. - For Java, the compiler generates a
.javafile with a class for each message type, as well as a specialBuilderclasses for creating message class instances. - Python is a little different – the Python compiler generates a module with a static descriptor of each message type in your
.proto, which is then used with a metaclass to create the necessary Python data access class at runtime. - For Go, the compiler generates a
.pb.gofile with a type for each message type in your file. - For Ruby, the compiler generates a
.rbfile with a Ruby module containing your message types. - For Objective-C, the compiler generates a
pbobjc.handpbobjc.mfile from each.proto, with a class for each message type described in your file. - For C#, the compiler generates a
.csfile from each.proto, with a class for each message type described in your file. - For Dart, the compiler generates a
.pb.dartfile with a class for each message type in your file.
Scalar type
| .proto Type | Notes | C++ Type | Java Type | Python Type[2] | Go Type | Ruby Type | C# Type | PHP Type | Dart Type | | :---------- | :----------------------------------------------------------- | :------- | :--------- | :------------- | :------ | :----------------------------- | :--------- | :---------------- | :-------- | | double | | double | double | float | float64 | Float | double | float | double | | float | | float | float | float | float32 | Float | float | float | double | | int32 | Use variable length encoding . Coding negative numbers is inefficient - If your field may have negative values , Please switch to sint32. | int32 | int | int | int32 | Fixnum or Bignum (as required) | int | integer | int | | int64 | Use variable length encoding . Coding negative numbers is inefficient - If your field may have negative values , Please switch to sint64. | int64 | long | int/long[3] | int64 | Bignum | long | integer/string[5] | Int64 | | uint32 | Use variable length encoding | uint32 | int | int/long | uint32 | Fixnum or Bignum (as required) | uint | integer | int | | uint64 | Use variable length encoding . | uint64 | long | int/long | uint64 | Bignum | ulong | integer/string[5] | Int64 | | sint32 | Use variable length encoding . The signature of the int value . These are more than normal int32 Encode negative numbers more effectively . | int32 | int | int | int32 | Fixnum or Bignum (as required) | int | integer | int | | sint64 | Use variable length encoding . The signature of the int value . These are more than normal int64 Encode negative numbers more effectively . | int64 | long | int/long | int64 | Bignum | long | integer/string[5] | Int64 | | fixed32 | Always four bytes . If the value is usually greater than 228, More than uint32 More effective . | uint32 | int | int/long | uint32 | Fixnum or Bignum (as required) | uint | integer | int | | fixed64 | Always eight bytes . If the value is usually greater than 256, More than uint64 More effective | uint64 | long | int/long[3] | uint64 | Bignum | ulong | integer/string[5] | Int64 | | sfixed32 | Always four bytes | int32 | int | int | int32 | Fixnum or Bignum (as required) | int | integer | int | | sfixed64 | Always eight bytes | int64 | long | int/long | int64 | Bignum | long | integer/string[5] | Int64 | | bool | | bool | boolean | bool | bool | TrueClass/FalseClass | bool | boolean | bool | | string | The string must always contain UTF-8 Encoding or 7 position ASCII Text , And not more than 232. | string | String | str/unicode | string | String (UTF-8) | string | string | String | | bytes | It can contain no more than 232 Any sequence of bytes . | string | ByteString | str | []byte | String (ASCII-8BIT) | ByteString | string | List |
The default value is
At that time, a coded message There is no one in the body message In the definition singular A field , stay message In the object parsed into , The corresponding field will be set to message The default value of this field in the definition . The default value depends on the type :
- For strings , The default value is an empty string .
- For bytes , The default value is null bytes .
- about bools, The default value is false.
- For number types , The default value is zero .
- For enumeration , The default value is the first defined enumeration value , The value must be 0.
- For message fields , The field is not set . Its exact value depends on the language . For more information , See the code generation Guide .
Enumeration type
When defining message types , You might want one of the fields to have only one predefined value, the value in the list . for example , Let's say you're going to work for each SearchRequest add to corpus Field , among corpus It can be UNIVERSAL,WEB,IMAGES,LOCAL,NEWS,PRODUCTS or VIDEO. You can do this very simply by adding enumerations to the message definition , And add constants for each possible enumeration value to achieve .
In the following example , We added a name Corpus Enumerated type of , And a Corpus Type field :
-
message SearchRequest {
-
string
query
=
1;
-
int32
page_number
=
2;
-
int32
result_per_page
=
3;
-
enum
Corpus {
-
UNIVERSAL =
0;
-
WEB =
1;
-
IMAGES =
2;
-
LOCAL =
3;
-
NEWS =
4;
-
PRODUCTS =
5;
-
VIDEO =
6;
-
}
-
Corpus
corpus
=
4;
-
}
As you can see ,Corpus The first constant of the enumeration is mapped to 0: All enumeration definitions need to contain a constant mapping to 0 And as the first line of the definition , This is because :
- There has to be 0 value , So we can put 0 As the default value of enumeration .
- proto2 The enumeration value in the first line of syntax is always the default value , For compatibility 0 The value must be the first line of the definition .
Use other Message type
You can use other message Type as the type of the field , Suppose you want to be in every SearchResponse The type carried in the message is Result The news of ,
You can be in the same .proto Define a Result Message type , And then in SearchResponse To designate a Result Type field .
-
message SearchResponse {
-
repeated
Result
results
=
1;
-
}
-
-
message Result {
-
string
url
=
1;
-
string
title
=
2;
-
repeated
string
snippets
=
3;
-
}
Import message definition
In the example above ,Result The message type is related to SearchResponse Defined in the same file - If the message type to be used as the field type is already in another .proto The document defines , What should I do ?
You can use other... By importing .proto The definition in the document . To import another .proto The definition of , Please add a... At the top of the file import sentence :
import "myproject/other_protos.proto";
By default , You can only use directly imported .proto The definition in the document . however , Sometimes you may need to .proto Move file to new location . Now? , You can put a virtual... In the old location .proto file , Use... In documents import public The syntax forwards all imports to a new location , Instead of moving directly .proto File and update all call points in one change . Any import contains import public Of the statement proto Anyone who files can pass dependencies and import public dependencies . for example
-
// new.proto
-
// All definitions are moved here
-
// old.proto
-
// This is the proto that all clients are importing.
-
import
public
"new.proto";
-
import
"other.proto";
-
// client.proto
-
import
"old.proto";
-
// You use definitions from old.proto and new.proto, but not other.proto
The compiler will pass the command line arguments -I perhaps --proto-path Search in the folder specified in .proto file , If no compiler is provided, it will search in the directory calling its compiler . Generally speaking, you should --proto-path Set the value of to the root directory of your project , And use fully qualified names for all imports .
Use proto2 The message type of
You can import proto2 Version of the message type to proto3 Used in the message type of , Of course you can proto2 Import... In message type proto3 The message type of . however proto2 The enumeration type of cannot be applied directly to proto3 In the grammar of .
Nested message types
Message types can be defined and used in other message types , In the following example Result Messages are defined in SearchResponse In the news
-
message
SearchResponse {
-
message Result {
-
string url =
1;
-
string title =
2;
-
repeated string snippets =
3;
-
}
-
repeated
Result
results = 1;
-
}
If you want to use the child message defined in the parent message externally , Use Parent.Type Quote them
-
message SomeOtherMessage {
-
SearchResponse.
Result result =
1;
-
}
You can nest any number of layers of messages
-
message Outer {
// Level 0
-
message MiddleAA {
// Level 1
-
message Inner {
// Level 2
-
int64 ival =
1;
-
bool booly =
2;
-
}
-
}
-
message MiddleBB {
// Level 1
-
message Inner {
// Level 2
-
int32 ival =
1;
-
bool booly =
2;
-
}
-
}
-
}
to update Message
If an existing message type no longer meets your current needs -- For example, you want to add an additional field to the message -- But I still want to use the code generated by the old version of the message format , Never mind ! Just remember the following rules , It's very simple to update the message definition without breaking the existing code .
- Do not change the field number of any saved fields .
- If you add a new field , Any message serialized by code generated by legacy message formats , It can still be parsed by the code generated according to the new message format . You should remember the default values of these elements, and the newly generated code can correctly interact with the messages created by the serialization of the old code . Allied , Messages created by new code can also be parsed by old code : Old news ( Binary system ) The newly added fields are simply ignored during parsing , Check out the unknown fields section below for more .
- As long as the field number is no longer reused in the updated message type , You can delete this field . You can also rename fields , For example, add
OBSOLETE_Prefix or set the field number toreserved, These future users will not accidentally reuse the field number .
Unknown field
Unknown field is well formed protocol buffer serialized data , Represents a field that is not recognized by the parser . for example , When the old binary parses the data sent by the new binary with new fields , These new fields will become unknown fields in the old binary .
first ,proto3 Messages always discard unknown fields during parsing , But in 3.5 In the version , We reintroduce the retention of unknown fields to match proto2 Behavior . In version 3.5 And later , Unknown fields are reserved during parsing , And included in the serialized output .
Mapping type
If you want to create a map as message Part of the definition ,protocol buffers Provides a simple and convenient Syntax
map<key_type, value_type> map_field = N;
key_type It can be any integer or string ( Except floating point numbers and bytes All scalar types except ). Be careful enum Not an effective one key_type.value_type Can be any type except ( intend protocol buffers Nested... Is not allowed in the message body of map).
for instance , If you want to create one called projects Mapping , every last Project The message is associated with a string key , You can define it like this :
map<string, Project> projects = 3;
- The field in the mapping cannot be follow repeated Regular ( It means that the value of the field in the mapping cannot be an array ).
- The values in the map are out of order , So you can't rely on the order of elements in the map .
- Generate .proto Text format , Map key sorting . The number keys are sorted by number .
- When parsing or merging from lines , If there are duplicate mapping keys , Then use the last key you see . When parsing mapping from text format , If there are duplicate keys , Then parsing may fail .
- If no value is specified for the mapped field , The behavior when a field is serialized depends on the language . stay C++, Java and Python The default values of field types in are serialized as field values , Other languages do not .
to Message Add package name
You can .proto Add an optional... To the file package To prevent name conflicts before message types .
-
package foo.bar;
-
message Open { ... }
In defining message Use the fields as follows package name
-
message Foo {
-
...
-
foo.bar.Open
open =
1;
-
...
-
}
package The effect of symbols on generated code depends on the programming language
Defining services
If you want the message type to be the same as RPC( Remote procedure call ) Use the system together , You can .proto Define a RPC Service interface , then protocol buffer The compiler will generate service interface code and... According to the programming language you choose stub, Add a service you want to define , One of its ways is to accept SearchRequest Message return SearchResponse news , You can .proto It is defined in the file like the following example :
-
service
SearchService {
-
rpc Search (SearchRequest)
returns (SearchResponse);
-
}
And protocol buffer The simplest to use together RPC System is gRPC: A kind of Google Developed language and platform neutral open source RPC System . gRPC Especially for protocol buffer, Your special use of protocol buffer Compiler plug-ins directly from .proto File generation related RPC Code .
If you don't want to use it gRPC, You can use your own RPC System , More about implementing RPC The details of the system can be found in Proto2 Language Guide Find .
JSON codec
Proto3 Support JSON Specification code in , Makes it easier to share data between systems . The coding rules are listed type by type in the following table .
If JSON A value is missing from the encoded data , Or its value is null, Then it is resolved to protocol buffer when , It will be interpreted as the corresponding default value . If the field is in protocol buffer Has a default value in , By default, it will be in JSON Omit this field from the encoded data to save space . Writing a codec implementation can override this default behavior in JSON Option to keep fields with default values in encoded output .
| proto3 | JSON | JSON example | Notes | | :--------------------- | :------------ | :--------------------------------------- | :----------------------------------------------------------- | | message | object | {"fooBar": v, "g": null,…} | Generate JSON object . The message field name will be converted to a small hump and become JSON Object key . If you specify json_name Field options , The specified value is used as the key . The parser accepts the name of the hump ( Or by the json_name The name specified by the option ) And primitive proto Field name . null Is an acceptable value for all field types , And is regarded as the default value of the corresponding field type . | | enum | string | "FOO_BAR" | Use proto The name of the enumeration value specified in . The parser accepts enumeration names and integer values . | | map | object | {"k": v, …} | All keys will be converted to strings | | repeated V | array | [v, …] | null Will be converted to an empty list [] | | bool | true, false | true, false | | | string | string | "Hello World!" | | | bytes | base64 string | "YWJjMTIzIT8kKiYoKSctPUB+" | JSON The value will be standard with padding base64 Data encoded as a string . Accept with / Standard or without filling URL Security base64 code . | | int32, fixed32, uint32 | number | 1, -10, 0 | JSON value will be a decimal number. Either numbers or strings are accepted. | | int64, fixed64, uint64 | string | "1", "-10" | JSON value will be a decimal string. Either numbers or strings are accepted. | | float, double | number | 1.1, -10.0, 0, "NaN","Infinity" | JSON value will be a number or one of the special string values "NaN", "Infinity", and "-Infinity". Either numbers or strings are accepted. Exponent notation is also accepted. | | Any | object | {"@type": "url", "f": v, … } | If the Any contains a value that has a special JSON mapping, it will be converted as follows: {"@type": xxx, "value": yyy}. Otherwise, the value will be converted into a JSON object, and the "@type" field will be inserted to indicate the actual data type. | | Timestamp | string | "1972-01-01T10:00:20.021Z" | Uses RFC 3339, where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. | | Duration | string | "1.000340012s", "1s" | Generated output always contains 0, 3, 6, or 9 fractional digits, depending on required precision, followed by the suffix "s". Accepted are any fractional digits (also none) as long as they fit into nano-seconds precision and the suffix "s" is required. | | Struct | object | { … } | Any JSON object. See struct.proto. | | Wrapper types | various types | 2, "2", "foo", true,"true", null, 0, … | Wrappers use the same representation in JSON as the wrapped primitive type, except that null is allowed and preserved during data conversion and transfer. | | FieldMask | string | "f.fooBar,h" | See field_mask.proto. | | ListValue | array | [foo, bar, …] | | | Value | value | | Any JSON value | | NullValue | null | | JSON null | | Empty | object | {} | An empty JSON object |
The generated code
To generate Java,Python,C ++,Go,Ruby,Objective-C or C# Code , You need to use .proto Message types defined in the file , You need to .proto Up operation protocol buffer compiler protoc. If the compiler is not already installed , Please download the package and follow README Operate according to the instructions in the document . about Go, You also need to install a special code generator plug-in for the compiler : You can GitHub Upper golang/protobuf Find the plug-in and installation instructions in the project .
The compiler evokes :
protoc --proto_path=IMPORT_PATH --cpp_out=DST_DIR --java_out=DST_DIR --python_out=DST_DIR --go_out=DST_DIR --ruby_out=DST_DIR --objc_out=DST_DIR --csharp_out=DST_DIR path/to/file.proto
IMPORT_PATHSpecifies when parsingimportWhere to search when ordering.protofile , If omitted, it will be found in the current working directory , It can be passed multiple times--proto-pathParameter to specify multiple import Catalog , They will be searched by the compiler in order .-I=IMPORT_PATHyes--proto_pathShort form of .- You can provide one or more output commands :
--cpp_outgenerates C++ code inDST_DIR. See the C++ generated code reference for more.--java_outgenerates Java code inDST_DIR. See the Java generated code reference for more.--python_outgenerates Python code inDST_DIR. See the Python generated code reference for more.--go_outgenerates Go code inDST_DIR. See the Go generated code reference for more.--ruby_outgenerates Ruby code inDST_DIR. Ruby generated code reference is coming soon!--objc_outgenerates Objective-C code inDST_DIR. See the Objective-C generated code reference for more.--csharp_outgenerates C# code inDST_DIR. See the C# generated code reference for more.--php_outgenerates PHP code inDST_DIR. See the PHP generated code reference for more.- One or more... Must be provided .proto File as input . You can specify more than one at a time .proto file . Although the file is named relative to the current directory , But each file must exist in one of them IMPORT_PATH in , So that the compiler can determine its specification name .
版权声明
本文为[weixin_ forty-six million two hundred and seventy-two thousand ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230748008907.html
边栏推荐
- Qt读写XML文件
- Install MySQL for Ubuntu and query the average score
- CGM optimizes blood glucose monitoring and management -- Yiyu technology appears in Sichuan International Medical Exchange Promotion Association
- An article understands variable lifting
- QT compilation qtxlsx Library
- 队列(c语言/链表)
- Vowel substring in statistical string of leetcode simple problem
- WordPress爱导航主题 1.1.3 简约大气网站导航源码网址导航源码
- Qt利用QtXlsx操作excel文件
- 对OutputStream类的flush()方法的误解
猜你喜欢
随机推荐
clang 如何产生汇编文件
js将树形结构数据转为一维数组数据
网赚APP资源下载类网站源码
数据的删除和修改操作(mysql)
input元素添加监听事件
数据可视化:使用Excel制作雷达图
JS common array methods
总线结构概述
dmp引擎工作总结(2021,光剑)
LeetCode简单题之统计字符串中的元音子字符串
ansible自動化運維詳解(一)ansible的安裝部署、參數使用、清單管理、配置文件參數及用戶級ansible操作環境構建
数论求a^b(a,b为1e12级别)的因子之和
青苹果影视系统源码 影视聚合 影视导航 影视点播网站源码
編譯原理題-帶答案
一款拥有漂亮外表的Typecho简洁主题_Scarfskin 源码下载
怎么读书读论文
记录:js删除数组中某一项或几项的几种方法
ansible自动化运维详解(一)ansible的安装部署、参数使用、清单管理、配置文件参数及用户级ansible操作环境构建
ASAN 极简原理
分组背包呀









