当前位置：网站首页>Minutes of OpenMLDB Meetup No.2

Minutes of OpenMLDB Meetup No.2

2022-08-11 06:34:00 【Fourth Paradigm Developer Community】

1. Meeting Contents

The OpenMLDB community held the second meetup on April 16, 2022. The related videos and materials of the meeting are as follows:
● Co-founder of StreamNative Zhai Jia - facing the upstream data ecology of OpenMLDB, in-depth analysis of cloud native newsStreaming platform Apache Pulsar.
https://www.zhihu.com/zvideo/1499462777765244928
https://pan.baidu.com/s/1VNtnVvGhEpWNzecGklbqNQ

● Lu Mian, head of OpenMLDB R&D - for real-time feature computing scenarios, introduces the feature development process based on OpenMLDB and the architecture of the machine learning feature computing platform.
https://www.zhihu.com/zvideo/1499468557670461440 https://pan.baidu.com/s/1kBPJWnak254i_VdlupqRpA

● Huang Wei, OpenMLDB R&D Architect - Practical drill of OpenMLDB Pulsar Connector, taking you to efficiently connect real-time data to feature engineering.
https://www.zhihu.com/zvideo/1499478121748914176 https://pan.baidu.com/s/18-EhQMqhYNrP2IxSglGGdw
(Baidu network disk data collection password is open)

2. Discussion

During the meeting, several guests discussed with the community. Here we show some Q&A as follows:

Q1: In addition to computational logic, does OpenMLDB have a mechanism to ensure data consistency offline?
A: The offline and online data of OpenMLDB are separate storage engines.In most cases, the data used in offline development and the data used in online computing are not the same. Online data will continue to introduce new data only for real-time reasoning over time.So from this point of view, it is not necessary to maintain the consistency of offline and online data.

Q2: Is OpenMLDB suitable for IoT analytics?
A: Many data in the Internet of Things are time series data with time stamps.For this kind of time series data, it is theoretically very suitable to use OpenMLDB for analysis, including feature calculation.If you have related needs, you are welcome to interact and discuss with us in the community.

Q3: Which languages does OpenMLDB provide SDKs?
A: Currently OpenMLDB SDK can support Python, Java, and REST APIs.

Q4: If Flink is used for real-time inference, how can it be consistent with batch training Spark?
A: If Flink is used for real-time reasoning, it is currently difficult to achieve computational consistency with our Spark distribution.The two main ones do not use OpenMLDB's consistency engine to generate a completely consistent execution plan for computational logic.Therefore, it is recommended that you directly use the complete process of OpenMLDB to ensure the consistency of online and offline.

Q5: Can the algorithm of feature engineering be extended by UDF in SQL?
A: UDF will be supported in version 0.5.0 which will be released this month.Currently, C/C++ UDFs will be supported first, and Python UDFs will be supported in a later version.

Q6: What are the essential differences between OpenMLDB and Mysql?
A: MySQL is an OLTP database, and its positioning is very different from OpenMLDB.MySQL may also be able to complete some online feature computing tasks, but it does not have an online and offline consistent design architecture. In addition, for some important feature computing operations (such as OpenMLDB optimized cross-window aggregation, etc.)Sexual optimization.

Q7: What are the advantages of OpenMLDB compared with similar products or open source tools on the market?
A: At present, there are Feature Store products on the market, which are similar in positioning to OpenMLDB. They all provide feature platforms for machine learning.However, most Feature Store products, such as the most famous open source project Feast, do not provide real-time feature computing capabilities, and do not ensure the consistency of online and offline at the computing layer.They are more about opening up the features of offline computing and the ability to share online.The commercial version of Tecton provides similar real-time computing capabilities, but it is still pushed to Spark as described, so it is expected that the performance of real-time computing has not been optimized.

Q8: Feature ops refers to the feature itself, or the algorithm used to extract the feature, such as pca, fm, etc.What is the difference between feature ops and model ops
A: The feature ops here refers to the computational logic for extracting the feature itself, rather than some feature processing algorithms you mentioned.You can basically think of it as a data processing logic similar to database SQL.

3. OpenMLDB Community

Thank you for your strong support for this meetup. If you want to learn more about OpenMLDB or participate in community technical exchanges, you can get relevant information and interaction through the following channels.

● Github: https://link.zhihu.com/?target=https%3A//github.com/4paradigm/OpenMLDB
● Official website: https://openmldb.ai
● Email:mailto:[email protected]
● https://www.zhihu.com/people/xdai-xia-ke
● https://link.zhihu.com/?target=https%3A//memark.io/wp-content/uploads/2021/12/OpenMLDB-group.png

原网站

版权声明
本文为[Fourth Paradigm Developer Community]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/223/202208110515240656.html