当前位置:网站首页>[Data Architecture] Distributed Data Grid as a Solution for Centralized Data Monolith
[Data Architecture] Distributed Data Grid as a Solution for Centralized Data Monolith
2022-08-10 08:59:00 【51CTO】
Enterprise data architects should not build large centralized data platforms, but create distributed data grids.This change in approach requires a paradigm shift, said Zhamak Dehghani, chief technical advisor at ThoughtWorks, in a presentation and related article at QCon in San Francisco.As data becomes more pervasive, traditional data warehouse and data lake architectures become overwhelmed and unable to scale effectively.Dehghani believes that a distributed data grid approach can overcome these inherent inefficiencies by adopting domain-oriented data ownership.
"I propose that the next enterprise data platform architecture is a fusion of distributed domain-driven architecture, self-service platform design, and data product thinking."
Her presentations included some real-world examples, but focused on new management principles, accompanied by new language to support this mindset.For example, service over-ingestion, discovery and usage over-fetching and loading.
Dehghani sees three failure modes in traditional data platform architectures.First, they are centralized and monolithic; bringing all types of data together may work for small organizations, but it will ultimately fail for businesses with a large number of data sources and different data consumers.
Second, there is what Dehghani describes as "coupled pipeline decomposition".Generations of architects have decomposed data platform architecture into "pipelines of data processing steps".These pipeline steps are orthogonal to the axis of change, and new functionality requires updates to all steps.
Isolation and hyper-professional ownership are the ultimate failure modes.A centralized architecture naturally creates categories of data source teams providing data and consumer teams retrieving processed data.In the middle are data and machine learning experts.While the two external groups are domain-oriented, the central team must be domain-agnostic.
Dehghani compares these challenges to those of an N-layer monolith, where new customer requirements require modification of all layers.Microservices are better aligned with changing elements, but require a different design approach.A similar, dramatic shift in thinking is required to successfully implement a data grid architecture.
"In order to decentralize the overall data platform, we need to reverse our view of data, its location and ownership. Domains do not need to flow data from domains to centrally owned data lakes or platforms, but need to host and serve its domain dataset in an easy-to-consume way."
The envisioned architecture focuses on domain data products as first-class components, each with appropriate ownership by teams that understand the domain.A single, rigid data pipeline is no longer a primary design concern, and data is not clearly divided into source and consumption patterns.Distributed teams are able to use the data they need and can feed their output back into the grid for use by other teams.
For such an architecture to be successful, data products must be discoverable, addressable, trustworthy, self-describing, interoperable, secure, and supported byGlobal access control constraints.These characteristics are the responsibility of individual data product owners, aided by joint governance and platforms that provide the data infrastructure.
- Image Credit: Zhamak Dehghani
Data warehouses and data lakes can still exist in this architecture, but they are just another node in the grid, not a centralized monolith.If the team still needs the functionality done by the data warehouse and lake, then they should be free to accept it.Likewise, there is a correlation in the adoption of microservices and polyglot solutions.
Dehghani's QCon presentation "The Data Grid Paradigm Shift in Data Platform Architecture" will be released in the coming weeks.Her article "How to Migrate from a Single Data Lake to a Distributed Data Grid" is now available.She will also be a guest on the InfoQ Podcast.
边栏推荐
- 【OAuth2】二十、OAuth2扩展协议 PKCE
- Hugo NexT主题升级记录
- Delphi实现的一个文件在线查询显示下载功能
- PTA 习题2.1 简单计算器
- J9 Digital Theory: What kind of sparks will Web3.0+ Internet e-commerce cause?
- [OAuth2] Nineteen, OpenID Connect dynamic client registration
- PTA Exercise 2.2 Rotate an Array Left
- 不同的命令行风格
- UGUI - Events, iTween Plugin
- 郭晶晶家的象棋私教,好家伙是个机器人
猜你喜欢
DAY25: Logic vulnerability recurrence
It is obvious that a unique index is added, why does it still generate duplicate data?
JWT:拥有我,即拥有权力
[OAuth2] Nineteen, OpenID Connect dynamic client registration
J9 digital science: Web 3.0 is about data ownership or decentralized?
The sixteenth day & the basic operation of charles
Linux下载安装MySql
ARM体系结构2:处理器内核和汇编指令集
DeepFake换脸诈骗怎么破?让他侧个身
DAY26: GetShell project
随机推荐
PTA 习题2.1 简单计算器
菜鸟、小白在autojs和冰狐智能辅助之间如何选择?
【Unity入门计划】Collision2D类&Collider2D类
00后女孩月薪3200,3年买两套房,这个程序员变现新风口千万要把握住
爬虫-爬取某小说网站
线程池的基本概念、结构、类
基于sklearn的决策树应用实战
The sixteenth day & the basic operation of charles
DeepFake换脸诈骗怎么破?让他侧个身
凭借这份阿里架构师的万字面试手册,逆风翻盘,斩获阿里offer
dayjs-----时间格式化
J9 Number Theory: Macro Analysis of DAO Characteristics
组合数模板
[OAuth2] 20. OAuth2 Extended Protocol PKCE
Docker搭建Mysql一主一从
js-----数组转换成树形结构
怎么使用【jmeter正则表达式提取器】解决返回值作参数的问题
浅析JWT安全问题
【API 管理】什么是 API 管理,为什么它很重要?
FPGA中BEL Site Tile FSR SLR分别指什么?