当前位置:网站首页>ONEFLOW learning notes: from functor to opexprinter

ONEFLOW learning notes: from functor to opexprinter

2022-04-23 08:41:00 ONEFLOW deep learning framework

163ae5a78ab0ca9589a64e998bc49932.png

writing | Moon step

to update | Zhao Luyang

Previously written 《OneFlow Learning notes :python To C++ Call process analysis 》, from Python The code caught up with Functor This floor , This paper starts from Functor Start chasing down , The back is OpExprInterpreter.

1

Functor review

Functor Layer as OneFlow Infrastructure of , by Python End sum C++ End provides op Unified access to operations , This is in 《python To C++ Call process analysis 》 There is a detailed analysis in , It uses Relu As an example , This is to minimize the cost of understanding , This article continues with Relu As an example, follow-up code , As listed above ReluFunctor Code for , In order to facilitate the cohesion of context , Make a simple list :

 
  
class ReluFunctor {
 public:
  ReluFunctor() { op_ = CHECK_JUST(one::OpBuilder("relu").Input("x", 1).Output("y", 1).Build()); }
  Maybe<Tensor> operator()(const std::shared_ptr<Tensor>& x, bool inplace) const {
    ...
    return OpInterpUtil::Dispatch<Tensor>(*op_, {x});
  }
 private:
  std::shared_ptr<OpExpr> op_;
};

The code is simple , It can be divided into three parts :

  • Define the data structure : That is, class member variables op_, It is OpExpr type , This is the main part of section II below

  • Constructors : Use OpBuilder This auxiliary class is for op_ It's initialized , Base note is the last call. Build() When , The internal call is described in Section 2 UserOpExpr Static functions in New To create

  • Function call operator overload function : This is through a Dispatch Function to schedule specific calculations , Finally, it will be calculated on a specific device , There are too many details , The third section of this article first talks about part of the content , The complete chain will be summarized later

2

OpExpr

The operator is in OneFlow In the framework of OpExpr To represent , In addition to the representation operator , It can also represent some other operations , Have a look first OpExpr Inheritance system :

c4d22e24f3b938f3b1706c35b1be885a.png

chart 1

Operator corresponds to OpExpr It's usually the picture above 1 Orange in inherits the color at the bottom of the chain UserOpExpr, The code definition is located in oneflow/core/framework/op_expr.h, The others OpExpr I don't know much about , Make a summary after knowing something later , In the orange chain of succession , The main data structure of each class is as follows :

1.OpExpr It's a virtual base class , No data members

2.BuiltinOpExpr Is a relatively high-level and important base class , The main maintenance is op_name、input arg、output arg Information :

 
  
class BuiltinOpExpr : public OpExpr {
  std::string op_name_;
  std::shared_ptr<const ArgTuple> input_arg_tuple_;
  std::shared_ptr<const ArgTuple> output_arg_tuple_;
};

3.BuiltinOpExprImpl The main maintenance is op proto and grad func Information about , Subclass through the previous text 《C/C++ gossip :CRTP》 Introduced CRTP To use this class , Mainly to reuse the interface , The template parameter type here is mainly determined by proto Type of file generation , This is also called ProtoType Why , In an effort to 1 Take the orange inheritance chain in as an example , The use of UserOpConf To instantiate , It is from oneflow/core/framework/user_op_conf.proto A data structure generated automatically , Now let's show BuiltinOpExprImpl and user_op_conf.proto Main content :

 
  
template<typename ProtoType>
class BuiltinOpExprImpl : public BuiltinOpExpr {
  ProtoType op_proto_;
  mutable std::shared_ptr<OpExprGradFunctionIf> op_grad_func_;
};


// oneflow/core/framework/user_op_conf.proto
message UserOpConf {
  message ListString { repeated string s = 1; }
  required string op_type_name = 1;
  map<string, ListString> input = 2;
  map<string, ListString> output = 3;
  map<string, AttrValue> attr = 4;
  repeated string input_order = 5;
  repeated string output_order = 6;
}

4. And finally UserOpExpr, It maintains some op Of attrs、shape Of infer function、dtype Of infer function Etc :

 
  
class UserOpExpr final : public BuiltinOpExprImpl<UserOpConf> {
  AttrMap base_attrs_;
  user_op::TensorDescInferFn shape_infer_fn_;
  user_op::DataTypeInferFn dtype_infer_fn_;
  user_op::DeviceInferFn device_infer_fn_;
  mutable HashMap<Symbol<Device>, std::shared_ptr<StatefulLocalOpKernel>> device2kernel_;
  std::shared_ptr<ConsistentTensorInferCache> consistent_tensor_infer_cache_;


public:
  static Maybe<UserOpExpr> New(const std::string& op_name, ...);
};

The interface part of these classes basically corresponds to the data structure , You can make up your own brain , Only one is listed above UserOpExpr Static New Interface , It's used to create a UserOpExpr object , Ahead one::OpBuilder("relu") You'll eventually call this function to create OpExpr object .

3

OpExprInterpreter

simply ,OpExprInterpreter Used to base on OpExpr Different types for distribution , That is, the processing flow that cannot be connected later , This is in OneFlow It is called different execution modes , at present OneFlow The supported execution modes are eager and lazy, among eager Can be further subdivided into mirror and consistent( notes :OneFlow v0.7.0 After the version, it is collectively referred to as “global”), As shown in the figure below :

145c6e83d117042f4e032953ca93413e.png

chart 2

Obvious , above OpExprInterpreter A total of the above-mentioned are derived mirror、consistent、lazy Three interpreter, besides , chart 2 There is also one marked orange in the AutogradInterpreter class , It and OpExprInterpreter Between has-a The relationship between , And provide a Appy Interface to choose three execution modes , Here is the simplified code :

 
  
class AutogradInterpreter {
  std::shared_ptr<OpExprInterpreter> internal_;
public:
  Maybe<void> Apply(const OpExpr& op_expr, ...) const { ... }
};

Let's start with the beginning of the article ReluFunctor In the code OpInterpUtil::Dispatch Come on, start chasing , Called here Dispatch In the definition of oneflow/core/framework/op_interpreter/op_interpreter_util.h, This is a series of overloaded functions , You can simply think of them as a pile helper function, No matter which overloaded version is called dispatch, Will eventually be imported into the following overloaded version Dispatch in , be located oneflow/core/framework/op_interpreter/op_interpreter_util.cpp+142:

 
  
Maybe<void> OpInterpUtil::Dispatch(
      const OpExpr& op_expr, 
      const TensorTuple& inputs,
      TensorTuple* outputs,
      const OpExprInterpContext& ctx) {
  return JUST(GetInterpreter(inputs, ctx, op_expr))->Apply(op_expr, inputs, outputs, ctx);
}

Let's first look at some parameters here ,op_expr It was created earlier UserOpExpr Object of type ,TensorTuple It can be simply considered as vector<Tensor>,inputs/outputs That is, the corresponding input and output Tensor,OneFlow Medium Tensor For details, please refer to the above 《Global View Related concepts and implementation of 》 The third section of , The last parameter is OpExprInterpContext type , It is mainly used to preserve op Of attributes Information , Defined in oneflow/core/framework/op_interpreter.h+36, Here are the main data structures :

 
  
struct OpExprInterpContext {
  ...
  AttrMap attrs;
  Optional<Symbol<Device>> device;
  Optional<Symbol<ParallelDesc>> parallel_desc;
  Optional<Symbol<cfg::NdSbp>> nd_sbp;
  std::shared_ptr<user_op::OpKernelState> state;
};

Go on and watch OpInterpUtil::Dispatch Medium GetInterpreter() call , It will create the previous figure based on the context information provided 2 Shown AutogradInterpreter object :

 
  
Maybe<AutogradInterpreter> GetInterpreter(const TensorTuple& inputs, const OpExprInterpContext& ctx,
                                          const OpExpr& op_expr) {
  static const auto& g_lazy_interpreter = BuildLazyInterpreter();
  static const auto& g_eager_consistent_interpreter = BuildEagerInterpreter(/*is_mirrored=*/false);
  static const auto& g_eager_mirrored_interpreter = BuildEagerInterpreter(/*is_mirrored=*/true);
  if (!LazyMode::is_enabled()) {
    if (inputs.empty()) {
      if (ctx.parallel_desc.has_value()) {
        JUST(ctx.nd_sbp);
        CHECK_OR_RETURN(!ctx.device.has_value());
        return g_eager_consistent_interpreter;
      } else {
        CHECK_OR_RETURN(!ctx.nd_sbp.has_value());
        return g_eager_mirrored_interpreter;
      }
    }
...

Then use the created AutogradInterpreter Object called AutogradInterpreter Of Apply Interface to choose three execution modes , Its implementation lies in oneflow/core/framework/op_interpreter/op_interpreter.cpp+86:

 
  
Maybe<void> AutogradInterpreter::Apply(const OpExpr& op_expr, const TensorTuple& inputs,
                                       TensorTuple* outputs, const OpExprInterpContext& ctx) const {
  bool requires_grad = false;
  if (autograd::GradMode::is_enabled() && !JUST(op_expr.IsGradDisabled())) {
    requires_grad =
        std::any_of(inputs.begin(), inputs.end(),
                    [](const std::shared_ptr<Tensor>& tensor) { return tensor->requires_grad(); });
  }
  {
    autograd::AutoGradMode mode(false);
    JUST(internal_->Apply(op_expr, inputs, outputs, ctx));
  }
  // Lazy mode will construct backward compute graph in passes, so disable autograd if lazy mode.
  std::shared_ptr<OpExprGradClosure> grad_closure(nullptr);
  if (requires_grad && !LazyMode::is_enabled()) {
    grad_closure = JUST(op_expr.GetOrCreateOpGradClosure());
    auto backward_fn =
        std::make_shared<std::function<Maybe<void>(const TensorTuple&, TensorTuple*, bool)>>(
            [=](const TensorTuple& out_grads, TensorTuple* in_grads,
                bool create_graph) -> Maybe<void> {
              autograd::AutoGradMode mode(create_graph);
              JUST(grad_closure->Apply(out_grads, in_grads));
              return Maybe<void>::Ok();
            });
    JUST(GetThreadLocalAutogradEngine()->AddBackwardFuncPtr(op_expr.op_type_name() + "_backward",
                                                            backward_fn, inputs, outputs));
  }
  // Update outputs autograd meta
  // Note: if requires_grad is True, we will create a new autograd meta for each output
  // in `AddBackwardFuncPtr` to support inplace operation, so the update should after
  // `AddBackwardFuncPtr`
  for (auto& output : *outputs) {
    output->set_is_leaf(inputs.size() == 0 || !requires_grad);
    if (!output->requires_grad()) {
      JUST(output->set_requires_grad(
          requires_grad && IsSupportRequireGradDataType(output->dtype()->data_type())));
    }
  }
  if (requires_grad && !LazyMode::is_enabled()) {
    // Capture inputs and outputs after `AddBackwardFuncPtr` because of that grad function
    // node has been attached to them.
    JUST(grad_closure->Capture(inputs, *outputs, ctx));
  }
  return Maybe<void>::Ok();
}

Here's the main thing JUST(internal_->Apply(op_expr, inputs, outputs, ctx));( The codes in the following columns are similar to backward relevant , This article only focuses on forward This is the main line ), It actually calls what it holds OpExprInterpreter Of Apply function , According to the previous figure 2 Three of them Interpreter Let's take a look at the following process .

3.1 Mirror mode

If we choose mirror Execution mode of ,internal_->Apply It will actually call EagerMirroredInterpreter Base class of EagerInterpreter Medium Apply, be located oneflow/core/framework/op_interpreter/op_interpreter.cpp+51:

 
  
Maybe<void> EagerInterpreter::Apply(const OpExpr& op_expr, ...) const {
#define APPLY_IF(op_type)                                              \
  if (const auto* op = dynamic_cast<const op_type##Expr*>(&op_expr)) { \
    return ApplyImpl(*op, inputs, outputs, ctx);                       \
  }


  APPLY_IF(UserOp);
  APPLY_IF(VariableOp);
  APPLY_IF(CastToMirroredOp);
  ...
}

It's actually used here dynamic_cast According to OpExpr The actual type of is distributed dynamically , It can also be combined with the following figure to help understand :

4500fbbf7b0154b94d3ca3bfb7d43506.png

chart 3

We are from ReluFunctor From , Created is UserOpExpr, So here we call EagerMirroredInterpreter The following one in the ApplyImpl function , be located oneflow/core/framework/op_interpreter/eager_mirrored_op_interpreter.cpp+191:

 
  
Maybe<void> EagerMirroredInterpreter::ApplyImpl(const UserOpExpr& op_expr,
                                                const TensorTuple& inputs, TensorTuple* outputs,
                                                const OpExprInterpContext& ctx) const {
  return NaiveInterpret(op_expr, inputs, outputs, ctx);
}

Here we continue to call... In the same file NaiveInterpret function , This function is very long , Mainly doing entry OneFlow Preparation before virtual machine , One of the most important preparations is based on input and output Tensor Object to create the virtual machine vm::EagerBlobObject object , Its definition lies in oneflow/core/eager/eager_blob_object.h+83, The main data members are as follows :

 
  
class EagerBlobObject final : public BlobObject {
  std::unique_ptr<Blob> blob_;
  std::unique_ptr<char[]> header_buffer_;
  std::shared_ptr<TensorStorage> tensor_storage_;
  std::atomic<bool> is_shape_synced_;
  int64_t storage_offset_;
  intrusive::shared_ptr<LocalDepObject> compute_local_dep_object_;
};

stay EagerBlobObject In the data members of ,Blob and TensorStorage Maintain real data storage space , in addition , As can be seen from the code above ,EagerBlobObject There is also inheritance , The summary is as follows :

8d134f0696153587b10e9a3375e9a2ce.png

chart 4

About NaiveInterpret, More content , Mainly in preparation for entering the virtual machine , The last piece of code is shown below , It's entry OneFlow Virtual machine entry :

 
  
Maybe<void> NaiveInterpret(const UserOpExpr& user_op_expr, ...) {
  ...
  JUST(PhysicalRun([&](InstructionsBuilder* builder) -> Maybe<void> {
    return builder->LocalCallOpKernel(
        kernel, 
        input_eager_blob_objects, 
        output_eager_blob_objects,
        ctx, 
        op_device);
  }));
  return Maybe<void>::Ok();
}

Virtual machines are outside the scope of this article , Take time to continue your study in the future .

3.2 Global mode

About Global The concept of , above 《Global View Related concepts and implementation of 》 There is a detailed analysis , Here we will use the concept directly . If we choose Global Execution mode of ,internal_->Apply Actual and mirror The pattern will also call EagerInterpreter Medium Apply, be located oneflow/core/framework/op_interpreter/op_interpreter.cpp+51:

 
  
Maybe<void> EagerInterpreter::Apply(const OpExpr& op_expr, ...) const {
#define APPLY_IF(op_type)                                              \
  if (const auto* op = dynamic_cast<const op_type##Expr*>(&op_expr)) { \
    return ApplyImpl(*op, inputs, outputs, ctx);                       \
  }


  APPLY_IF(UserOp);
  APPLY_IF(VariableOp);
  APPLY_IF(CastToMirroredOp);
  ...
}

Use here dynamic_cast According to OpExpr The actual type of dynamic ( The example of this article is UserOpExpr This type ) Distributed to EagerConsistentInterpreter Medium ApplyImpl function , Definition is located oneflow/core/framework/op_interpreter/eager_consistent_op_interpreter.cpp+194:

 
  
Maybe<void> EagerConsistentInterpreter::ApplyImpl(const UserOpExpr& op_expr,
                                                  const TensorTuple& inputs, TensorTuple* outputs,
                                                  const OpExprInterpContext& ctx) const {
  return InterpretThenInitConsistentId(op_expr, inputs, outputs, ctx);
}

here InterpretThenInitConsistentId Is a function pointer , Point to use NonRecursiveInitConsistentId As a decorative device to package Interpret The function of this function , Let's take a brief look at the code of the decorator , First look at DECORATE macro , be located oneflow/core/common/decorator.h+39:

 
  
template<template<typename...> class Decorator>
struct WithDecorator final {
  template<typename T, typename = void>
  struct Decorate;
  template<typename T, typename... Args>
  struct Decorate<T (*)(Args...)> final {
    template<T (*func)(Args...)>
    static T Call(Args... args) {
      return Decorator<T, Args...>::template Call<func>(args...);
    }
  };
};


#define DECORATE(fn_ptr, decorator) \
  (&WithDecorator<decorator>::Decorate<decltype(fn_ptr)>::Call<fn_ptr>)

among WithDecorator It's a decorator wrapper ,Decorator Is the template type parameter of its template , Represents the actual decorator , And then invoke the actual decorator. Call function , In this case WithDecorator Use NonRecursiveInitConsistentId As Decorator To instantiate ,NonRecursiveInitConsistentId Definition is located oneflow/core/framework/tensor_consistent_id.h+35:

 
  
template<typename Arg0, typename Arg1, typename... Args>
struct NonRecursiveInitConsistentId<Maybe<void>, Arg0, Arg1, TensorTuple*, Args...> {
  template<Maybe<void> (*func)(Arg0, Arg1, TensorTuple*, Args...)>
  static Maybe<void> Call(Arg0 arg0, Arg1 arg1, TensorTuple* outputs, Args... args) {
    auto* recursive_depth = MutThreadLocalConsistentIdDepth();
    ++*recursive_depth;
    Maybe<void> ret = func(arg0, arg1, outputs, args...);
    --*recursive_depth;
    if (*recursive_depth == 0 && ret.IsOk()) { JUST(InitConsistentId(outputs)); }
    return ret;
  }
};

As can be seen from the above NonRecursiveInitConsistentId This Decorator The function of is to ensure InitConsistentId Executed only once . Continue to look at eager The main line of the pattern , Which is decorated by this ornament Interpret This function , be located oneflow/core/framework/op_interpreter/eager_consistent_op_interpreter.cpp+112, This function also contains a little more , To sum up, I mainly did the following things :

  • Create the previous 《Global View Related concepts and implementation of 》 In Section 3 ConsistentTensorMeta Information , Stored in ConsistentTensorInferResult In this data structure

  • by output Create the corresponding EagerConsistentTensorImpl and ConsistentTensor

  • According to the input and output Tensor, Create the previous figure 3 Displayed vm::EagerBlobObject object , These objects will be in OneFlow Used in virtual machines , In the middle of this may do boxing The operation of , This part is not familiar at present , I'll summarize it separately after I'm familiar with it

  • Enter virtual machine , Schedule and execute the current op

The simplified code is shown below :

 
  
Maybe<void> Interpret(const UserOpExpr& user_op_expr, const TensorTuple& inputs,
                      TensorTuple* outputs, const OpExprInterpContext& ctx) {
  // step 1
  const auto& infer_args = JUST(ConsistentTensorMetaInferArgs::New(ctx.attrs, inputs));
  std::shared_ptr<const ConsistentTensorInferResult> result =
      JUST(user_op_expr.mut_consistent_tensor_infer_cache()->GetOrInfer(*infer_args));
  const auto& output_tensor_metas = result->output_tensor_metas();
  // step 2
  for (int i = 0; i < outputs->size(); ++i) {
    if (!outputs->at(i)) {
      const auto& tensor_impl = JUST(EagerConsistentTensorImpl::New(
          output_tensor_metas.at(i), tensor_device, parallel_id, false, false));
      outputs->at(i).reset(new ConsistentTensor(tensor_impl));
    }
  }
  // step 3
  for (int i = 0; i < inputs.size(); ++i) {
    const auto& local_tensor = JUST(input->cur_rank_phy_tensor());
    input_eager_blob_objects->at(i) = JUST(local_tensor->eager_blob_object());
  }
  for (int i = 0; i < outputs->size(); ++i) {
    const auto& local_tensor = JUST(outputs->at(i)->cur_rank_phy_tensor());
    output_eager_blob_objects->at(i) = JUST(local_tensor->eager_blob_object());
  }
  // step 4
  JUST(PhysicalRun([&](InstructionsBuilder* builder) -> Maybe<void> {
    return builder->LocalCallOpKernel(kernel, input_eager_blob_objects, output_eager_blob_objects,
                                      result, ctx, result->op_device());
  }));
  return Maybe<void>::Ok();
}

This is before entering the virtual machine EagerMirroredInterpreter About the main line of work .

3.3 Lazy mode

This part is not familiar at present , After you are familiar with it, you can summarize it separately .

This article mainly combs OpExprInterpreter Main responsibilities and related implementation , The main reference is OneFlow The official code and some previous related articles , Here are the related links :

Everyone else is watching

Welcome to download experience OneFlow v0.7.0 The latest version :

GitHub - Oneflow-Inc/oneflow: OneFlow is a performance-centered and open-source deep learning framework.icon-default.png?t=M3K6https://github.com/Oneflow-Inc/oneflow/

版权声明
本文为[ONEFLOW deep learning framework]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230835291060.html