当前位置:网站首页>Introduction to protobuf

Introduction to protobuf

2022-04-23 08:23:00 weixin_ forty-six million two hundred and seventy-two thousand

What is? Protobuf

Protobuf yes Protocol Buffers For short , It is Google A data description language developed by the company , Used to describe a lightweight and efficient structured data storage format , And in 2008 Open source in .Protobuf Can be used for structured data serialization , Or say serialize . Its design is very suitable for data carrier in network communication , Suitable for data storage or RPC Data exchange format , It serializes less data, plus K-V To store data , The version compatibility of messages is very strong , Can be used for communication protocols 、 Language independence in areas such as data storage 、 Platform independent 、 Extensible serialization structure data format . Developers can use Protobuf The attached tools generate code and realize the function of serializing structured data .

Protobuf The most basic unit of data in is message, Is similar to Go The existence of structure in language . stay message You can nest message Or other members of the underlying data type .

The tutorial will describe how to use protocol buffer Language constructs your protocol buffer data , Include .proto The syntax of the file and how to pass .proto File generation data access class . The tutorial uses proto3 Version of protocol buffer Language .

Definition Message

Let's start with a simple example , For example, you define a search request message, Each search request will contain a search string , Return the results on the page , And the size of the result set . stay .proto As defined in the document :


     
      
  1. syntax = "proto3";
  2. message SearchRequest {
  3. string query = 1;
  4. int32 page_number = 2;
  5. int32 result_per_page = 3;
  6. }
  • .proto The first line of the file specifies the use of proto3 grammar . If omitted protocol buffer Compiler default use proto2 grammar . It must be the first line of the non empty non comment line in the file .
  • SearchRequest Three fields are specified in the definition (name/value Key value pair ), Each field will have a name and type .

Specify field type

In the example above , All fields are two integers of scalar type (page_number and result_per_page) And a string (query). However, you can also specify compound types for fields , Including enumeration types and others message type

Specify the field number

stay message Each field in the definition has a unique number , These numbers are used to identify the fields you define in the binary message body , Once your message After the type is used, you should not modify these numbers . Note that the message Field number when encoding into binary message body 1-15 Will take up 1 Bytes ,16-2047 Will take up two bytes . So in some frequently used message in , You should always use the front first 1-15 Field number .

The minimum number you can specify is 1, The biggest is 2E29 - 1(536,870,911). among 19000 To 19999 It's for protocol buffers Realize the reserved field label , Definition message Can not be used when . Similarly, you can't reuse any current message The field number used and reserved in the definition .

Rules for defining fields

message The fields of must comply with the following rules :

  • singular: One follows singular The fields of the rule , In a well structured message Message body ( Encoding message) There can be 0 or 1 This field ( But you can't have more than one ). This is a proto3 The default field rule for Syntax .( This is a bit obscure to understand , For example, the three fields in the above example are singular Type field , There can be... In the encoded message body 0 perhaps 1 individual query Field , But there won't be more .)
  • repeated: follow repeated The field of the rule can have any number of values in the message , The order of these values can be maintained in the message weight ( Is the field of array type )

Add more message types

In a single .proto Multiple... Can be defined in the file message, This is defining multiple related message It's very useful . for instance , We define SearchRequest The corresponding response message SearchResponse , Add it to the previous .proto In file .


     
      
  1. message SearchRequest {
  2. string query = 1;
  3. int32 page_number = 2;
  4. int32 result_per_page = 3;
  5. }
  6. message SearchResponse {
  7. ...
  8. }

Add notes

.proto Notes and in the document C,C++ The annotation style is the same , Use // and / ... /


     
      
  1. /* SearchRequest represents a search query, with pagination options to
  2. * indicate which results to include in the response. */
  3. message SearchRequest {
  4. string query = 1;
  5. int32 page_number = 2; // Which page number do we want?
  6. int32 result_per_page = 3; // Number of results to return per page.
  7. }

Keep field

When you delete or comment out message One of the fields , In the future, other developers will update message When defining, you can reuse the previous field number . If they accidentally load the old version .proto Files can cause serious problems , For example, data corruption 、 Privacy disclosure, etc . One way to avoid problems is to specify reserved field numbers and field names . If someone uses these fields to identify in the future, it is protocol buffer The compiler will report an error .


     
      
  1. message Foo {
  2. reserved 2, 15, 9 to 11;
  3. reserved "foo", "bar";
  4. }

proto What code will be generated

When using protocol buffer Compiler compilation .proto When you file , The compiler will be based on your .proto Defined in the file message Type generates code for the specified programming language . The generated code includes accessing and setting field values 、 format message Type to output stream , Parse out... From the input stream message etc. .

  • For C++, the compiler generates a .h and .cc file from each .proto, with a class for each message type described in your file.
  • For Java, the compiler generates a .java file with a class for each message type, as well as a special Builderclasses for creating message class instances.
  • Python is a little different – the Python compiler generates a module with a static descriptor of each message type in your .proto, which is then used with a metaclass to create the necessary Python data access class at runtime.
  • For Go, the compiler generates a .pb.go file with a type for each message type in your file.
  • For Ruby, the compiler generates a .rb file with a Ruby module containing your message types.
  • For Objective-C, the compiler generates a pbobjc.h and pbobjc.m file from each .proto, with a class for each message type described in your file.
  • For C#, the compiler generates a .cs file from each .proto, with a class for each message type described in your file.
  • For Dart, the compiler generates a .pb.dart file with a class for each message type in your file.

Scalar type

| .proto Type | Notes | C++ Type | Java Type | Python Type[2] | Go Type | Ruby Type | C# Type | PHP Type | Dart Type | | :---------- | :----------------------------------------------------------- | :------- | :--------- | :------------- | :------ | :----------------------------- | :--------- | :---------------- | :-------- | | double | | double | double | float | float64 | Float | double | float | double | | float | | float | float | float | float32 | Float | float | float | double | | int32 | Use variable length encoding . Coding negative numbers is inefficient - If your field may have negative values , Please switch to sint32. | int32 | int | int | int32 | Fixnum or Bignum (as required) | int | integer | int | | int64 | Use variable length encoding . Coding negative numbers is inefficient - If your field may have negative values , Please switch to sint64. | int64 | long | int/long[3] | int64 | Bignum | long | integer/string[5] | Int64 | | uint32 | Use variable length encoding | uint32 | int | int/long | uint32 | Fixnum or Bignum (as required) | uint | integer | int | | uint64 | Use variable length encoding . | uint64 | long | int/long | uint64 | Bignum | ulong | integer/string[5] | Int64 | | sint32 | Use variable length encoding . The signature of the int value . These are more than normal int32 Encode negative numbers more effectively . | int32 | int | int | int32 | Fixnum or Bignum (as required) | int | integer | int | | sint64 | Use variable length encoding . The signature of the int value . These are more than normal int64 Encode negative numbers more effectively . | int64 | long | int/long | int64 | Bignum | long | integer/string[5] | Int64 | | fixed32 | Always four bytes . If the value is usually greater than 228, More than uint32 More effective . | uint32 | int | int/long | uint32 | Fixnum or Bignum (as required) | uint | integer | int | | fixed64 | Always eight bytes . If the value is usually greater than 256, More than uint64 More effective | uint64 | long | int/long[3] | uint64 | Bignum | ulong | integer/string[5] | Int64 | | sfixed32 | Always four bytes | int32 | int | int | int32 | Fixnum or Bignum (as required) | int | integer | int | | sfixed64 | Always eight bytes | int64 | long | int/long | int64 | Bignum | long | integer/string[5] | Int64 | | bool | | bool | boolean | bool | bool | TrueClass/FalseClass | bool | boolean | bool | | string | The string must always contain UTF-8 Encoding or 7 position ASCII Text , And not more than 232. | string | String | str/unicode | string | String (UTF-8) | string | string | String | | bytes | It can contain no more than 232 Any sequence of bytes . | string | ByteString | str | []byte | String (ASCII-8BIT) | ByteString | string | List |

The default value is

At that time, a coded message There is no one in the body message In the definition singular A field , stay message In the object parsed into , The corresponding field will be set to message The default value of this field in the definition . The default value depends on the type :

  • For strings , The default value is an empty string .
  • For bytes , The default value is null bytes .
  • about bools, The default value is false.
  • For number types , The default value is zero .
  • For enumeration , The default value is the first defined enumeration value , The value must be 0.
  • For message fields , The field is not set . Its exact value depends on the language . For more information , See the code generation Guide .

Enumeration type

When defining message types , You might want one of the fields to have only one predefined value, the value in the list . for example , Let's say you're going to work for each SearchRequest add to corpus Field , among corpus It can be UNIVERSAL,WEB,IMAGES,LOCAL,NEWS,PRODUCTS or VIDEO. You can do this very simply by adding enumerations to the message definition , And add constants for each possible enumeration value to achieve .

In the following example , We added a name Corpus Enumerated type of , And a Corpus Type field :


     
      
  1. message SearchRequest {
  2. string query = 1;
  3. int32 page_number = 2;
  4. int32 result_per_page = 3;
  5. enum Corpus {
  6. UNIVERSAL = 0;
  7. WEB = 1;
  8. IMAGES = 2;
  9. LOCAL = 3;
  10. NEWS = 4;
  11. PRODUCTS = 5;
  12. VIDEO = 6;
  13. }
  14. Corpus corpus = 4;
  15. }

As you can see ,Corpus The first constant of the enumeration is mapped to 0: All enumeration definitions need to contain a constant mapping to 0 And as the first line of the definition , This is because :

  • There has to be 0 value , So we can put 0 As the default value of enumeration .
  • proto2 The enumeration value in the first line of syntax is always the default value , For compatibility 0 The value must be the first line of the definition .

Use other Message type

You can use other message Type as the type of the field , Suppose you want to be in every SearchResponse The type carried in the message is Result The news of ,

You can be in the same .proto Define a Result Message type , And then in SearchResponse To designate a Result Type field .


     
      
  1. message SearchResponse {
  2. repeated Result results = 1;
  3. }
  4. message Result {
  5. string url = 1;
  6. string title = 2;
  7. repeated string snippets = 3;
  8. }

Import message definition

In the example above ,Result The message type is related to SearchResponse Defined in the same file - If the message type to be used as the field type is already in another .proto The document defines , What should I do ?

You can use other... By importing .proto The definition in the document . To import another .proto The definition of , Please add a... At the top of the file import sentence :

import "myproject/other_protos.proto";

     
      

By default , You can only use directly imported .proto The definition in the document . however , Sometimes you may need to .proto Move file to new location . Now? , You can put a virtual... In the old location .proto file , Use... In documents import public The syntax forwards all imports to a new location , Instead of moving directly .proto File and update all call points in one change . Any import contains import public Of the statement proto Anyone who files can pass dependencies and import public dependencies . for example


     
      
  1. // new.proto
  2. // All definitions are moved here
  3. // old.proto
  4. // This is the proto that all clients are importing.
  5. import public "new.proto";
  6. import "other.proto";
  7. // client.proto
  8. import "old.proto";
  9. // You use definitions from old.proto and new.proto, but not other.proto

The compiler will pass the command line arguments -I perhaps --proto-path Search in the folder specified in .proto file , If no compiler is provided, it will search in the directory calling its compiler . Generally speaking, you should --proto-path Set the value of to the root directory of your project , And use fully qualified names for all imports .

Use proto2 The message type of

You can import proto2 Version of the message type to proto3 Used in the message type of , Of course you can proto2 Import... In message type proto3 The message type of . however proto2 The enumeration type of cannot be applied directly to proto3 In the grammar of .

Nested message types

Message types can be defined and used in other message types , In the following example Result Messages are defined in SearchResponse In the news


     
      
  1. message SearchResponse {
  2. message Result {
  3. string url = 1;
  4. string title = 2;
  5. repeated string snippets = 3;
  6. }
  7. repeated Result results = 1;
  8. }

If you want to use the child message defined in the parent message externally , Use Parent.Type Quote them


     
      
  1. message SomeOtherMessage {
  2. SearchResponse. Result result = 1;
  3. }

You can nest any number of layers of messages


     
      
  1. message Outer { // Level 0
  2. message MiddleAA { // Level 1
  3. message Inner { // Level 2
  4. int64 ival = 1;
  5. bool booly = 2;
  6. }
  7. }
  8. message MiddleBB { // Level 1
  9. message Inner { // Level 2
  10. int32 ival = 1;
  11. bool booly = 2;
  12. }
  13. }
  14. }

to update Message

If an existing message type no longer meets your current needs -- For example, you want to add an additional field to the message -- But I still want to use the code generated by the old version of the message format , Never mind ! Just remember the following rules , It's very simple to update the message definition without breaking the existing code .

  • Do not change the field number of any saved fields .
  • If you add a new field , Any message serialized by code generated by legacy message formats , It can still be parsed by the code generated according to the new message format . You should remember the default values of these elements, and the newly generated code can correctly interact with the messages created by the serialization of the old code . Allied , Messages created by new code can also be parsed by old code : Old news ( Binary system ) The newly added fields are simply ignored during parsing , Check out the unknown fields section below for more .
  • As long as the field number is no longer reused in the updated message type , You can delete this field . You can also rename fields , For example, add OBSOLETE_ Prefix or set the field number to reserved, These future users will not accidentally reuse the field number .

Unknown field

Unknown field is well formed protocol buffer serialized data , Represents a field that is not recognized by the parser . for example , When the old binary parses the data sent by the new binary with new fields , These new fields will become unknown fields in the old binary .

first ,proto3 Messages always discard unknown fields during parsing , But in 3.5 In the version , We reintroduce the retention of unknown fields to match proto2 Behavior . In version 3.5 And later , Unknown fields are reserved during parsing , And included in the serialized output .

Mapping type

If you want to create a map as message Part of the definition ,protocol buffers Provides a simple and convenient Syntax

map<key_type, value_type> map_field = N;

     
      

key_type It can be any integer or string ( Except floating point numbers and bytes All scalar types except ). Be careful enum Not an effective one key_type.value_type Can be any type except ( intend protocol buffers Nested... Is not allowed in the message body of map).

for instance , If you want to create one called projects Mapping , every last Project The message is associated with a string key , You can define it like this :

map<string, Project> projects = 3;

     
      
  • The field in the mapping cannot be follow repeated Regular ( It means that the value of the field in the mapping cannot be an array ).
  • The values in the map are out of order , So you can't rely on the order of elements in the map .
  • Generate .proto Text format , Map key sorting . The number keys are sorted by number .
  • When parsing or merging from lines , If there are duplicate mapping keys , Then use the last key you see . When parsing mapping from text format , If there are duplicate keys , Then parsing may fail .
  • If no value is specified for the mapped field , The behavior when a field is serialized depends on the language . stay C++, Java and Python The default values of field types in are serialized as field values , Other languages do not .

to Message Add package name

You can .proto Add an optional... To the file package To prevent name conflicts before message types .


     
      
  1. package foo.bar;
  2. message Open { ... }

In defining message Use the fields as follows package name


     
      
  1. message Foo {
  2. ...
  3. foo.bar.Open open = 1;
  4. ...
  5. }

package The effect of symbols on generated code depends on the programming language

Defining services

If you want the message type to be the same as RPC( Remote procedure call ) Use the system together , You can .proto Define a RPC Service interface , then protocol buffer The compiler will generate service interface code and... According to the programming language you choose stub, Add a service you want to define , One of its ways is to accept SearchRequest Message return SearchResponse news , You can .proto It is defined in the file like the following example :


     
      
  1. service SearchService {
  2. rpc Search (SearchRequest) returns (SearchResponse);
  3. }

And protocol buffer The simplest to use together RPC System is gRPC: A kind of Google Developed language and platform neutral open source RPC System . gRPC Especially for protocol buffer, Your special use of protocol buffer Compiler plug-ins directly from .proto File generation related RPC Code .

If you don't want to use it gRPC, You can use your own RPC System , More about implementing RPC The details of the system can be found in Proto2 Language Guide Find .

JSON codec

Proto3 Support JSON Specification code in , Makes it easier to share data between systems . The coding rules are listed type by type in the following table .

If JSON A value is missing from the encoded data , Or its value is null, Then it is resolved to protocol buffer when , It will be interpreted as the corresponding default value . If the field is in protocol buffer Has a default value in , By default, it will be in JSON Omit this field from the encoded data to save space . Writing a codec implementation can override this default behavior in JSON Option to keep fields with default values in encoded output .

| proto3 | JSON | JSON example | Notes | | :--------------------- | :------------ | :--------------------------------------- | :----------------------------------------------------------- | | message | object | {"fooBar": v, "g": null,…} | Generate JSON object . The message field name will be converted to a small hump and become JSON Object key . If you specify json_name Field options , The specified value is used as the key . The parser accepts the name of the hump ( Or by the json_name The name specified by the option ) And primitive proto Field name . null Is an acceptable value for all field types , And is regarded as the default value of the corresponding field type . | | enum | string | "FOO_BAR" | Use proto The name of the enumeration value specified in . The parser accepts enumeration names and integer values . | | map | object | {"k": v, …} | All keys will be converted to strings | | repeated V | array | [v, …] | null Will be converted to an empty list [] | | bool | true, false | true, false | | | string | string | "Hello World!" | | | bytes | base64 string | "YWJjMTIzIT8kKiYoKSctPUB+" | JSON The value will be standard with padding base64 Data encoded as a string . Accept with / Standard or without filling URL Security base64 code . | | int32, fixed32, uint32 | number | 1, -10, 0 | JSON value will be a decimal number. Either numbers or strings are accepted. | | int64, fixed64, uint64 | string | "1", "-10" | JSON value will be a decimal string. Either numbers or strings are accepted. | | float, double | number | 1.1, -10.0, 0, "NaN","Infinity" | JSON value will be a number or one of the special string values "NaN", "Infinity", and "-Infinity". Either numbers or strings are accepted. Exponent notation is also accepted. | | Any | object | {"@type": "url", "f": v, … } | If the Any contains a value that has a special JSON mapping, it will be converted as follows: {"@type": xxx, "value": yyy}. Otherwise, the value will be converted into a JSON object, and the "@type" field will be inserted to indicate the actual data type. | | Timestamp | string | "1972-01-01T10:00:20.021Z" | Uses RFC 3339, where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. | | Duration | string | "1.000340012s", "1s" | Generated output always contains 0, 3, 6, or 9 fractional digits, depending on required precision, followed by the suffix "s". Accepted are any fractional digits (also none) as long as they fit into nano-seconds precision and the suffix "s" is required. | | Struct | object | { … } | Any JSON object. See struct.proto. | | Wrapper types | various types | 2, "2", "foo", true,"true", null, 0, … | Wrappers use the same representation in JSON as the wrapped primitive type, except that null is allowed and preserved during data conversion and transfer. | | FieldMask | string | "f.fooBar,h" | See field_mask.proto. | | ListValue | array | [foo, bar, …] | | | Value | value | | Any JSON value | | NullValue | null | | JSON null | | Empty | object | {} | An empty JSON object |

The generated code

To generate Java,Python,C ++,Go,Ruby,Objective-C or C# Code , You need to use .proto Message types defined in the file , You need to .proto Up operation protocol buffer compiler protoc. If the compiler is not already installed , Please download the package and follow README Operate according to the instructions in the document . about Go, You also need to install a special code generator plug-in for the compiler : You can GitHub Upper golang/protobuf Find the plug-in and installation instructions in the project .

The compiler evokes :

protoc --proto_path=IMPORT_PATH --cpp_out=DST_DIR --java_out=DST_DIR --python_out=DST_DIR --go_out=DST_DIR --ruby_out=DST_DIR --objc_out=DST_DIR --csharp_out=DST_DIR path/to/file.proto
     
      
  • IMPORT_PATH Specifies when parsing import Where to search when ordering .proto file , If omitted, it will be found in the current working directory , It can be passed multiple times --proto-path Parameter to specify multiple import Catalog , They will be searched by the compiler in order .-I=IMPORT_PATH yes --proto_path Short form of .
  • You can provide one or more output commands :
  • --cpp_out generates C++ code in DST_DIR. See the C++ generated code reference for more.
  • --java_out generates Java code in DST_DIR. See the Java generated code reference for more.
  • --python_out generates Python code in DST_DIR. See the Python generated code reference for more.
  • --go_out generates Go code in DST_DIR. See the Go generated code reference for more.
  • --ruby_out generates Ruby code in DST_DIR. Ruby generated code reference is coming soon!
  • --objc_out generates Objective-C code in DST_DIR. See the Objective-C generated code reference for more.
  • --csharp_out generates C# code in DST_DIR. See the C# generated code reference for more.
  • --php_out generates PHP code in DST_DIR. See the PHP generated code reference for more.
  • One or more... Must be provided .proto File as input . You can specify more than one at a time .proto file . Although the file is named relative to the current directory , But each file must exist in one of them IMPORT_PATH in , So that the compiler can determine its specification name .

版权声明
本文为[weixin_ forty-six million two hundred and seventy-two thousand ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230748008907.html