当前位置:网站首页>ONEFLOW learning notes: from functor to opexprinter
ONEFLOW learning notes: from functor to opexprinter
2022-04-23 08:41:00 【ONEFLOW deep learning framework】
writing | Moon step
to update | Zhao Luyang
Previously written 《OneFlow Learning notes :python To C++ Call process analysis 》, from Python The code caught up with Functor This floor , This paper starts from Functor Start chasing down , The back is OpExprInterpreter.
1
Functor review
Functor Layer as OneFlow Infrastructure of , by Python End sum C++ End provides op Unified access to operations , This is in 《python To C++ Call process analysis 》 There is a detailed analysis in , It uses Relu As an example , This is to minimize the cost of understanding , This article continues with Relu As an example, follow-up code , As listed above ReluFunctor Code for , In order to facilitate the cohesion of context , Make a simple list :
class ReluFunctor {
public:
ReluFunctor() { op_ = CHECK_JUST(one::OpBuilder("relu").Input("x", 1).Output("y", 1).Build()); }
Maybe<Tensor> operator()(const std::shared_ptr<Tensor>& x, bool inplace) const {
...
return OpInterpUtil::Dispatch<Tensor>(*op_, {x});
}
private:
std::shared_ptr<OpExpr> op_;
};
The code is simple , It can be divided into three parts :
-
Define the data structure : That is, class member variables op_, It is OpExpr type , This is the main part of section II below
-
Constructors : Use OpBuilder This auxiliary class is for op_ It's initialized , Base note is the last call. Build() When , The internal call is described in Section 2 UserOpExpr Static functions in New To create
-
Function call operator overload function : This is through a Dispatch Function to schedule specific calculations , Finally, it will be calculated on a specific device , There are too many details , The third section of this article first talks about part of the content , The complete chain will be summarized later
2
OpExpr
The operator is in OneFlow In the framework of OpExpr To represent , In addition to the representation operator , It can also represent some other operations , Have a look first OpExpr Inheritance system :
chart 1
Operator corresponds to OpExpr It's usually the picture above 1 Orange in inherits the color at the bottom of the chain UserOpExpr, The code definition is located in oneflow/core/framework/op_expr.h, The others OpExpr I don't know much about , Make a summary after knowing something later , In the orange chain of succession , The main data structure of each class is as follows :
1.OpExpr It's a virtual base class , No data members
2.BuiltinOpExpr Is a relatively high-level and important base class , The main maintenance is op_name、input arg、output arg Information :
class BuiltinOpExpr : public OpExpr {
std::string op_name_;
std::shared_ptr<const ArgTuple> input_arg_tuple_;
std::shared_ptr<const ArgTuple> output_arg_tuple_;
};
3.BuiltinOpExprImpl The main maintenance is op proto and grad func Information about , Subclass through the previous text 《C/C++ gossip :CRTP》 Introduced CRTP To use this class , Mainly to reuse the interface , The template parameter type here is mainly determined by proto Type of file generation , This is also called ProtoType Why , In an effort to 1 Take the orange inheritance chain in as an example , The use of UserOpConf To instantiate , It is from oneflow/core/framework/user_op_conf.proto A data structure generated automatically , Now let's show BuiltinOpExprImpl and user_op_conf.proto Main content :
template<typename ProtoType>
class BuiltinOpExprImpl : public BuiltinOpExpr {
ProtoType op_proto_;
mutable std::shared_ptr<OpExprGradFunctionIf> op_grad_func_;
};
// oneflow/core/framework/user_op_conf.proto
message UserOpConf {
message ListString { repeated string s = 1; }
required string op_type_name = 1;
map<string, ListString> input = 2;
map<string, ListString> output = 3;
map<string, AttrValue> attr = 4;
repeated string input_order = 5;
repeated string output_order = 6;
}
4. And finally UserOpExpr, It maintains some op Of attrs、shape Of infer function、dtype Of infer function Etc :
class UserOpExpr final : public BuiltinOpExprImpl<UserOpConf> {
AttrMap base_attrs_;
user_op::TensorDescInferFn shape_infer_fn_;
user_op::DataTypeInferFn dtype_infer_fn_;
user_op::DeviceInferFn device_infer_fn_;
mutable HashMap<Symbol<Device>, std::shared_ptr<StatefulLocalOpKernel>> device2kernel_;
std::shared_ptr<ConsistentTensorInferCache> consistent_tensor_infer_cache_;
public:
static Maybe<UserOpExpr> New(const std::string& op_name, ...);
};
The interface part of these classes basically corresponds to the data structure , You can make up your own brain , Only one is listed above UserOpExpr Static New Interface , It's used to create a UserOpExpr object , Ahead one::OpBuilder("relu") You'll eventually call this function to create OpExpr object .
3
OpExprInterpreter
simply ,OpExprInterpreter Used to base on OpExpr Different types for distribution , That is, the processing flow that cannot be connected later , This is in OneFlow It is called different execution modes , at present OneFlow The supported execution modes are eager and lazy, among eager Can be further subdivided into mirror and consistent( notes :OneFlow v0.7.0 After the version, it is collectively referred to as “global”), As shown in the figure below :
chart 2
Obvious , above OpExprInterpreter A total of the above-mentioned are derived mirror、consistent、lazy Three interpreter, besides , chart 2 There is also one marked orange in the AutogradInterpreter class , It and OpExprInterpreter Between has-a The relationship between , And provide a Appy Interface to choose three execution modes , Here is the simplified code :
class AutogradInterpreter {
std::shared_ptr<OpExprInterpreter> internal_;
public:
Maybe<void> Apply(const OpExpr& op_expr, ...) const { ... }
};
Let's start with the beginning of the article ReluFunctor In the code OpInterpUtil::Dispatch Come on, start chasing , Called here Dispatch In the definition of oneflow/core/framework/op_interpreter/op_interpreter_util.h, This is a series of overloaded functions , You can simply think of them as a pile helper function, No matter which overloaded version is called dispatch, Will eventually be imported into the following overloaded version Dispatch in , be located oneflow/core/framework/op_interpreter/op_interpreter_util.cpp+142:
Maybe<void> OpInterpUtil::Dispatch(
const OpExpr& op_expr,
const TensorTuple& inputs,
TensorTuple* outputs,
const OpExprInterpContext& ctx) {
return JUST(GetInterpreter(inputs, ctx, op_expr))->Apply(op_expr, inputs, outputs, ctx);
}
Let's first look at some parameters here ,op_expr It was created earlier UserOpExpr Object of type ,TensorTuple It can be simply considered as vector<Tensor>,inputs/outputs That is, the corresponding input and output Tensor,OneFlow Medium Tensor For details, please refer to the above 《Global View Related concepts and implementation of 》 The third section of , The last parameter is OpExprInterpContext type , It is mainly used to preserve op Of attributes Information , Defined in oneflow/core/framework/op_interpreter.h+36, Here are the main data structures :
struct OpExprInterpContext {
...
AttrMap attrs;
Optional<Symbol<Device>> device;
Optional<Symbol<ParallelDesc>> parallel_desc;
Optional<Symbol<cfg::NdSbp>> nd_sbp;
std::shared_ptr<user_op::OpKernelState> state;
};
Go on and watch OpInterpUtil::Dispatch Medium GetInterpreter() call , It will create the previous figure based on the context information provided 2 Shown AutogradInterpreter object :
Maybe<AutogradInterpreter> GetInterpreter(const TensorTuple& inputs, const OpExprInterpContext& ctx,
const OpExpr& op_expr) {
static const auto& g_lazy_interpreter = BuildLazyInterpreter();
static const auto& g_eager_consistent_interpreter = BuildEagerInterpreter(/*is_mirrored=*/false);
static const auto& g_eager_mirrored_interpreter = BuildEagerInterpreter(/*is_mirrored=*/true);
if (!LazyMode::is_enabled()) {
if (inputs.empty()) {
if (ctx.parallel_desc.has_value()) {
JUST(ctx.nd_sbp);
CHECK_OR_RETURN(!ctx.device.has_value());
return g_eager_consistent_interpreter;
} else {
CHECK_OR_RETURN(!ctx.nd_sbp.has_value());
return g_eager_mirrored_interpreter;
}
}
...
Then use the created AutogradInterpreter Object called AutogradInterpreter Of Apply Interface to choose three execution modes , Its implementation lies in oneflow/core/framework/op_interpreter/op_interpreter.cpp+86:
Maybe<void> AutogradInterpreter::Apply(const OpExpr& op_expr, const TensorTuple& inputs,
TensorTuple* outputs, const OpExprInterpContext& ctx) const {
bool requires_grad = false;
if (autograd::GradMode::is_enabled() && !JUST(op_expr.IsGradDisabled())) {
requires_grad =
std::any_of(inputs.begin(), inputs.end(),
[](const std::shared_ptr<Tensor>& tensor) { return tensor->requires_grad(); });
}
{
autograd::AutoGradMode mode(false);
JUST(internal_->Apply(op_expr, inputs, outputs, ctx));
}
// Lazy mode will construct backward compute graph in passes, so disable autograd if lazy mode.
std::shared_ptr<OpExprGradClosure> grad_closure(nullptr);
if (requires_grad && !LazyMode::is_enabled()) {
grad_closure = JUST(op_expr.GetOrCreateOpGradClosure());
auto backward_fn =
std::make_shared<std::function<Maybe<void>(const TensorTuple&, TensorTuple*, bool)>>(
[=](const TensorTuple& out_grads, TensorTuple* in_grads,
bool create_graph) -> Maybe<void> {
autograd::AutoGradMode mode(create_graph);
JUST(grad_closure->Apply(out_grads, in_grads));
return Maybe<void>::Ok();
});
JUST(GetThreadLocalAutogradEngine()->AddBackwardFuncPtr(op_expr.op_type_name() + "_backward",
backward_fn, inputs, outputs));
}
// Update outputs autograd meta
// Note: if requires_grad is True, we will create a new autograd meta for each output
// in `AddBackwardFuncPtr` to support inplace operation, so the update should after
// `AddBackwardFuncPtr`
for (auto& output : *outputs) {
output->set_is_leaf(inputs.size() == 0 || !requires_grad);
if (!output->requires_grad()) {
JUST(output->set_requires_grad(
requires_grad && IsSupportRequireGradDataType(output->dtype()->data_type())));
}
}
if (requires_grad && !LazyMode::is_enabled()) {
// Capture inputs and outputs after `AddBackwardFuncPtr` because of that grad function
// node has been attached to them.
JUST(grad_closure->Capture(inputs, *outputs, ctx));
}
return Maybe<void>::Ok();
}
Here's the main thing JUST(internal_->Apply(op_expr, inputs, outputs, ctx));( The codes in the following columns are similar to backward relevant , This article only focuses on forward This is the main line ), It actually calls what it holds OpExprInterpreter Of Apply function , According to the previous figure 2 Three of them Interpreter Let's take a look at the following process .
3.1 Mirror mode
If we choose mirror Execution mode of ,internal_->Apply It will actually call EagerMirroredInterpreter Base class of EagerInterpreter Medium Apply, be located oneflow/core/framework/op_interpreter/op_interpreter.cpp+51:
Maybe<void> EagerInterpreter::Apply(const OpExpr& op_expr, ...) const {
#define APPLY_IF(op_type) \
if (const auto* op = dynamic_cast<const op_type##Expr*>(&op_expr)) { \
return ApplyImpl(*op, inputs, outputs, ctx); \
}
APPLY_IF(UserOp);
APPLY_IF(VariableOp);
APPLY_IF(CastToMirroredOp);
...
}
It's actually used here dynamic_cast According to OpExpr The actual type of is distributed dynamically , It can also be combined with the following figure to help understand :
chart 3
We are from ReluFunctor From , Created is UserOpExpr, So here we call EagerMirroredInterpreter The following one in the ApplyImpl function , be located oneflow/core/framework/op_interpreter/eager_mirrored_op_interpreter.cpp+191:
Maybe<void> EagerMirroredInterpreter::ApplyImpl(const UserOpExpr& op_expr,
const TensorTuple& inputs, TensorTuple* outputs,
const OpExprInterpContext& ctx) const {
return NaiveInterpret(op_expr, inputs, outputs, ctx);
}
Here we continue to call... In the same file NaiveInterpret function , This function is very long , Mainly doing entry OneFlow Preparation before virtual machine , One of the most important preparations is based on input and output Tensor Object to create the virtual machine vm::EagerBlobObject object , Its definition lies in oneflow/core/eager/eager_blob_object.h+83, The main data members are as follows :
class EagerBlobObject final : public BlobObject {
std::unique_ptr<Blob> blob_;
std::unique_ptr<char[]> header_buffer_;
std::shared_ptr<TensorStorage> tensor_storage_;
std::atomic<bool> is_shape_synced_;
int64_t storage_offset_;
intrusive::shared_ptr<LocalDepObject> compute_local_dep_object_;
};
stay EagerBlobObject In the data members of ,Blob and TensorStorage Maintain real data storage space , in addition , As can be seen from the code above ,EagerBlobObject There is also inheritance , The summary is as follows :
chart 4
About NaiveInterpret, More content , Mainly in preparation for entering the virtual machine , The last piece of code is shown below , It's entry OneFlow Virtual machine entry :
Maybe<void> NaiveInterpret(const UserOpExpr& user_op_expr, ...) {
...
JUST(PhysicalRun([&](InstructionsBuilder* builder) -> Maybe<void> {
return builder->LocalCallOpKernel(
kernel,
input_eager_blob_objects,
output_eager_blob_objects,
ctx,
op_device);
}));
return Maybe<void>::Ok();
}
Virtual machines are outside the scope of this article , Take time to continue your study in the future .
3.2 Global mode
About Global The concept of , above 《Global View Related concepts and implementation of 》 There is a detailed analysis , Here we will use the concept directly . If we choose Global Execution mode of ,internal_->Apply Actual and mirror The pattern will also call EagerInterpreter Medium Apply, be located oneflow/core/framework/op_interpreter/op_interpreter.cpp+51:
Maybe<void> EagerInterpreter::Apply(const OpExpr& op_expr, ...) const {
#define APPLY_IF(op_type) \
if (const auto* op = dynamic_cast<const op_type##Expr*>(&op_expr)) { \
return ApplyImpl(*op, inputs, outputs, ctx); \
}
APPLY_IF(UserOp);
APPLY_IF(VariableOp);
APPLY_IF(CastToMirroredOp);
...
}
Use here dynamic_cast According to OpExpr The actual type of dynamic ( The example of this article is UserOpExpr This type ) Distributed to EagerConsistentInterpreter Medium ApplyImpl function , Definition is located oneflow/core/framework/op_interpreter/eager_consistent_op_interpreter.cpp+194:
Maybe<void> EagerConsistentInterpreter::ApplyImpl(const UserOpExpr& op_expr,
const TensorTuple& inputs, TensorTuple* outputs,
const OpExprInterpContext& ctx) const {
return InterpretThenInitConsistentId(op_expr, inputs, outputs, ctx);
}
here InterpretThenInitConsistentId Is a function pointer , Point to use NonRecursiveInitConsistentId As a decorative device to package Interpret The function of this function , Let's take a brief look at the code of the decorator , First look at DECORATE macro , be located oneflow/core/common/decorator.h+39:
template<template<typename...> class Decorator>
struct WithDecorator final {
template<typename T, typename = void>
struct Decorate;
template<typename T, typename... Args>
struct Decorate<T (*)(Args...)> final {
template<T (*func)(Args...)>
static T Call(Args... args) {
return Decorator<T, Args...>::template Call<func>(args...);
}
};
};
#define DECORATE(fn_ptr, decorator) \
(&WithDecorator<decorator>::Decorate<decltype(fn_ptr)>::Call<fn_ptr>)
among WithDecorator It's a decorator wrapper ,Decorator Is the template type parameter of its template , Represents the actual decorator , And then invoke the actual decorator. Call function , In this case WithDecorator Use NonRecursiveInitConsistentId As Decorator To instantiate ,NonRecursiveInitConsistentId Definition is located oneflow/core/framework/tensor_consistent_id.h+35:
template<typename Arg0, typename Arg1, typename... Args>
struct NonRecursiveInitConsistentId<Maybe<void>, Arg0, Arg1, TensorTuple*, Args...> {
template<Maybe<void> (*func)(Arg0, Arg1, TensorTuple*, Args...)>
static Maybe<void> Call(Arg0 arg0, Arg1 arg1, TensorTuple* outputs, Args... args) {
auto* recursive_depth = MutThreadLocalConsistentIdDepth();
++*recursive_depth;
Maybe<void> ret = func(arg0, arg1, outputs, args...);
--*recursive_depth;
if (*recursive_depth == 0 && ret.IsOk()) { JUST(InitConsistentId(outputs)); }
return ret;
}
};
As can be seen from the above NonRecursiveInitConsistentId This Decorator The function of is to ensure InitConsistentId Executed only once . Continue to look at eager The main line of the pattern , Which is decorated by this ornament Interpret This function , be located oneflow/core/framework/op_interpreter/eager_consistent_op_interpreter.cpp+112, This function also contains a little more , To sum up, I mainly did the following things :
-
Create the previous 《Global View Related concepts and implementation of 》 In Section 3 ConsistentTensorMeta Information , Stored in ConsistentTensorInferResult In this data structure
-
by output Create the corresponding EagerConsistentTensorImpl and ConsistentTensor
-
According to the input and output Tensor, Create the previous figure 3 Displayed vm::EagerBlobObject object , These objects will be in OneFlow Used in virtual machines , In the middle of this may do boxing The operation of , This part is not familiar at present , I'll summarize it separately after I'm familiar with it
-
Enter virtual machine , Schedule and execute the current op
The simplified code is shown below :
Maybe<void> Interpret(const UserOpExpr& user_op_expr, const TensorTuple& inputs,
TensorTuple* outputs, const OpExprInterpContext& ctx) {
// step 1
const auto& infer_args = JUST(ConsistentTensorMetaInferArgs::New(ctx.attrs, inputs));
std::shared_ptr<const ConsistentTensorInferResult> result =
JUST(user_op_expr.mut_consistent_tensor_infer_cache()->GetOrInfer(*infer_args));
const auto& output_tensor_metas = result->output_tensor_metas();
// step 2
for (int i = 0; i < outputs->size(); ++i) {
if (!outputs->at(i)) {
const auto& tensor_impl = JUST(EagerConsistentTensorImpl::New(
output_tensor_metas.at(i), tensor_device, parallel_id, false, false));
outputs->at(i).reset(new ConsistentTensor(tensor_impl));
}
}
// step 3
for (int i = 0; i < inputs.size(); ++i) {
const auto& local_tensor = JUST(input->cur_rank_phy_tensor());
input_eager_blob_objects->at(i) = JUST(local_tensor->eager_blob_object());
}
for (int i = 0; i < outputs->size(); ++i) {
const auto& local_tensor = JUST(outputs->at(i)->cur_rank_phy_tensor());
output_eager_blob_objects->at(i) = JUST(local_tensor->eager_blob_object());
}
// step 4
JUST(PhysicalRun([&](InstructionsBuilder* builder) -> Maybe<void> {
return builder->LocalCallOpKernel(kernel, input_eager_blob_objects, output_eager_blob_objects,
result, ctx, result->op_device());
}));
return Maybe<void>::Ok();
}
This is before entering the virtual machine EagerMirroredInterpreter About the main line of work .
3.3 Lazy mode
This part is not familiar at present , After you are familiar with it, you can summarize it separately .
This article mainly combs OpExprInterpreter Main responsibilities and related implementation , The main reference is OneFlow The official code and some previous related articles , Here are the related links :
-
https://github.com/Oneflow-Inc/oneflow
Everyone else is watching
-
Combat software system complexity : Properly layered , No less
-
OneFlow v0.7.0 Release : New distributed interface ,LiBai、Serving Everything
Welcome to download experience OneFlow v0.7.0 The latest version :
版权声明
本文为[ONEFLOW deep learning framework]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230835291060.html
边栏推荐
猜你喜欢
Redis Desktop Manager for Mac(Redis可视化工具)
php基于哈希算法出现的强弱比较漏洞
SYS_ CONNECT_ BY_ Path (column, 'char') combined with start with connect by prior
DJ音乐管理软件Pioneer DJ rekordbox
根据字节码获取类的绝对路径
K210 learning notes (II) serial communication between k210 and stm32
pgsql想实现mysql一样样的列子查询操作
excle加水印
GUI编程简介 swing
LaTeX论文排版操作
随机推荐
Search tree judgment (25 points)
JS中复制数组
LaTeX论文排版操作
OneFlow学习笔记:从Functor到OpExprInterpreter
Shell脚本进阶
RCC introduction of Hal Library
Failed to convert a NumPy array to a Tensor(Unsupported Object type int)
虚拟线上展会-线上vr展馆实现24h沉浸式看展
测试你的机器学习流水线
Detailed description of self feeling of auricular point weight loss 0422
idea底栏打开services
lgb,xgb,cat k折交叉验证
bashdb下载安装
SYS_CONNECT_BY_PATH(column,'char') 结合 start with ... connect by prior
Excle plus watermark
Let the earth have less "carbon" and rest on the road
Transformer XL: attention language modelsbbeyond a fixed length context paper summary
Input / output system
PDF with watermark
DOM 学习之—添加+-按钮