当前位置:网站首页>[Data Architecture] Distributed Data Grid as a Solution for Centralized Data Monolith
[Data Architecture] Distributed Data Grid as a Solution for Centralized Data Monolith
2022-08-10 08:59:00 【51CTO】
Enterprise data architects should not build large centralized data platforms, but create distributed data grids.This change in approach requires a paradigm shift, said Zhamak Dehghani, chief technical advisor at ThoughtWorks, in a presentation and related article at QCon in San Francisco.As data becomes more pervasive, traditional data warehouse and data lake architectures become overwhelmed and unable to scale effectively.Dehghani believes that a distributed data grid approach can overcome these inherent inefficiencies by adopting domain-oriented data ownership.
"I propose that the next enterprise data platform architecture is a fusion of distributed domain-driven architecture, self-service platform design, and data product thinking."
Her presentations included some real-world examples, but focused on new management principles, accompanied by new language to support this mindset.For example, service over-ingestion, discovery and usage over-fetching and loading.
Dehghani sees three failure modes in traditional data platform architectures.First, they are centralized and monolithic; bringing all types of data together may work for small organizations, but it will ultimately fail for businesses with a large number of data sources and different data consumers.
Second, there is what Dehghani describes as "coupled pipeline decomposition".Generations of architects have decomposed data platform architecture into "pipelines of data processing steps".These pipeline steps are orthogonal to the axis of change, and new functionality requires updates to all steps.
Isolation and hyper-professional ownership are the ultimate failure modes.A centralized architecture naturally creates categories of data source teams providing data and consumer teams retrieving processed data.In the middle are data and machine learning experts.While the two external groups are domain-oriented, the central team must be domain-agnostic.
![86de3e826b6c5b0edc913d3f50613757.png [Data Architecture] Distributed data grid as centralizedThe solution of data-type monolith_artificial intelligence](/img/99/b22268e3e47db9e9caad58722e8709.png)
Dehghani compares these challenges to those of an N-layer monolith, where new customer requirements require modification of all layers.Microservices are better aligned with changing elements, but require a different design approach.A similar, dramatic shift in thinking is required to successfully implement a data grid architecture.
"In order to decentralize the overall data platform, we need to reverse our view of data, its location and ownership. Domains do not need to flow data from domains to centrally owned data lakes or platforms, but need to host and serve its domain dataset in an easy-to-consume way."
The envisioned architecture focuses on domain data products as first-class components, each with appropriate ownership by teams that understand the domain.A single, rigid data pipeline is no longer a primary design concern, and data is not clearly divided into source and consumption patterns.Distributed teams are able to use the data they need and can feed their output back into the grid for use by other teams.
For such an architecture to be successful, data products must be discoverable, addressable, trustworthy, self-describing, interoperable, secure, and supported byGlobal access control constraints.These characteristics are the responsibility of individual data product owners, aided by joint governance and platforms that provide the data infrastructure.
![efe69afe6fcaac15dfb5b021b6f1d3a6.png [Data Architecture] Distributed data grid as centralizedSolution for a single data-type data-type_java_02](/img/3d/846fec2760ec1c6f3ff63ad34e842a.png)
- Image Credit: Zhamak Dehghani
Data warehouses and data lakes can still exist in this architecture, but they are just another node in the grid, not a centralized monolith.If the team still needs the functionality done by the data warehouse and lake, then they should be free to accept it.Likewise, there is a correlation in the adoption of microservices and polyglot solutions.
Dehghani's QCon presentation "The Data Grid Paradigm Shift in Data Platform Architecture" will be released in the coming weeks.Her article "How to Migrate from a Single Data Lake to a Distributed Data Grid" is now available.She will also be a guest on the InfoQ Podcast.
边栏推荐
猜你喜欢
随机推荐
日期类(暑假每日一题 19)
Different command line styles
DAY25:逻辑漏洞复现
郭晶晶家的象棋私教,好家伙是个机器人
js函数聚合的三种实现方式
微信小程序--》小程序生命周期和WXS使用
Hugo NexT主题升级记录
【微服务架构】为故障设计微服务架构
乐观锁与悲观锁
ARM Architecture 3: Addressing and Exception Handling of ARM Instructions
2 模块一:科研思维培养
Day36 LeetCode
NaiveUI中看起来没啥用的组件(文字渐变)实现原来这么简单
How to use [jmeter regular expression extractor] to solve the problem of returning the value as a parameter
组合数模板
npm i安装报错
Compilation failure:找不到符号
硬件工程师90天学习资料及笔记汇总20220730
浅析JWT安全问题
[OAuth2] 20. OAuth2 Extended Protocol PKCE









