当前位置:网站首页>[Data Architecture] Distributed Data Grid as a Solution for Centralized Data Monolith
[Data Architecture] Distributed Data Grid as a Solution for Centralized Data Monolith
2022-08-10 08:59:00 【51CTO】
Enterprise data architects should not build large centralized data platforms, but create distributed data grids.This change in approach requires a paradigm shift, said Zhamak Dehghani, chief technical advisor at ThoughtWorks, in a presentation and related article at QCon in San Francisco.As data becomes more pervasive, traditional data warehouse and data lake architectures become overwhelmed and unable to scale effectively.Dehghani believes that a distributed data grid approach can overcome these inherent inefficiencies by adopting domain-oriented data ownership.
"I propose that the next enterprise data platform architecture is a fusion of distributed domain-driven architecture, self-service platform design, and data product thinking."
Her presentations included some real-world examples, but focused on new management principles, accompanied by new language to support this mindset.For example, service over-ingestion, discovery and usage over-fetching and loading.
Dehghani sees three failure modes in traditional data platform architectures.First, they are centralized and monolithic; bringing all types of data together may work for small organizations, but it will ultimately fail for businesses with a large number of data sources and different data consumers.
Second, there is what Dehghani describes as "coupled pipeline decomposition".Generations of architects have decomposed data platform architecture into "pipelines of data processing steps".These pipeline steps are orthogonal to the axis of change, and new functionality requires updates to all steps.
Isolation and hyper-professional ownership are the ultimate failure modes.A centralized architecture naturally creates categories of data source teams providing data and consumer teams retrieving processed data.In the middle are data and machine learning experts.While the two external groups are domain-oriented, the central team must be domain-agnostic.
Dehghani compares these challenges to those of an N-layer monolith, where new customer requirements require modification of all layers.Microservices are better aligned with changing elements, but require a different design approach.A similar, dramatic shift in thinking is required to successfully implement a data grid architecture.
"In order to decentralize the overall data platform, we need to reverse our view of data, its location and ownership. Domains do not need to flow data from domains to centrally owned data lakes or platforms, but need to host and serve its domain dataset in an easy-to-consume way."
The envisioned architecture focuses on domain data products as first-class components, each with appropriate ownership by teams that understand the domain.A single, rigid data pipeline is no longer a primary design concern, and data is not clearly divided into source and consumption patterns.Distributed teams are able to use the data they need and can feed their output back into the grid for use by other teams.
For such an architecture to be successful, data products must be discoverable, addressable, trustworthy, self-describing, interoperable, secure, and supported byGlobal access control constraints.These characteristics are the responsibility of individual data product owners, aided by joint governance and platforms that provide the data infrastructure.
- Image Credit: Zhamak Dehghani
Data warehouses and data lakes can still exist in this architecture, but they are just another node in the grid, not a centralized monolith.If the team still needs the functionality done by the data warehouse and lake, then they should be free to accept it.Likewise, there is a correlation in the adoption of microservices and polyglot solutions.
Dehghani's QCon presentation "The Data Grid Paradigm Shift in Data Platform Architecture" will be released in the coming weeks.Her article "How to Migrate from a Single Data Lake to a Distributed Data Grid" is now available.She will also be a guest on the InfoQ Podcast.
边栏推荐
猜你喜欢
JVM探究
关于判断单峰数组的几种方法
StringUtils的具体操作
How to use [jmeter regular expression extractor] to solve the problem of returning the value as a parameter
Docker搭建Mysql一主一从
DAY26: GetShell project
mySQL add, delete, modify and check advanced
How AliExpress sellers seize product search weight
数据库注入提权总结(一)
Flink部署 完整使用 (第三章)
随机推荐
英伟达游戏显卡营收暴跌/ 谷歌数据中心爆炸致3人受伤/ iPhone电量百分比回归…今日更多新鲜事在此...
BUUCTF【pwn】解题记录(4-6页持续更新中)
Vivado时序约束中Tcl命令的对象及属性
不要把公司当成家,被通知裁员时会变得不幸...
1499. The maximum pile.then/deque
MUDA:对齐特定域的分布和分类器以实现来自多源域的跨域分类
【微服务架构】为故障设计微服务架构
Question brushing tool h
iwemeta metaverse: a doll sells for 9999 yuan, and Bubble Mart thinks it is not expensive at all
Different command line styles
ARM体系结构2:处理器内核和汇编指令集
【数据库架构】OLTP 和 OLAP:实际比较
Unity—UGUI control
UGUI—事件,iTween插件
不想再干会计了,蝶变向新,勇往直前,最后成功通过转行测试实现月薪翻倍~
不同的命令行风格
Docker搭建Mysql一主一从
【Unity入门计划】制作RubyAdventure03-使用碰撞体&触发器实现世界交互
Flink运行时架构 完整使用 (第四章)
shell遍历文件夹并输出