当前位置：网站首页>Distributed database problem (3): data consistency

Distributed database problem (3): data consistency

2022-08-10 00:57:00 【Penglai Taoist】

1. 什么是数据一致性

一直以来,在“分布式系统”和“数据库”The two subjects,一致性（Consistency）Are important concepts,但它表达的内容却并不相同.对于分布式系统而言,一致性是在探讨当系统内的一份逻辑数据存在多个物理的数据副本时,对其执行读写操作会产生什么样的结果,这也符合 CAP 理论对一致性的表述.而在数据库领域,“一致性”与事务密切相关,又进一步细化到 ACID 四个方面.其中,I Represented by the isolation（Isolation）,是“一致性”的核心内容,Research is how to coordinate the conflict between the transaction.因此,当我们谈论分布式数据库的一致性时,In essence is talking about数据一致性和事务一致性两个方面.这一点,从 Google Spanner The external consistency（External Consistency）Discourse can be corroborated.

Including distributed database, distributed storage system,In order to avoid the equipment and the effects of unreliable network,通常会存储多个数据副本.A logical data stored in multiple physical copies at the same time,自然带来了数据一致性问题.Discuss the data consistency and a premise,There is also read and write operations,否则也是没有意义的.Add the two factors together,Is more than a copy of the data read and write a set of strategies,被称为“一致性模型”（Consistency Model）.Consistency model number,让人难以分辨.为了便于你理解,I'll build a simple analysis framework.这里,I want to borrow the paper“The many faces of consistency”中的两个概念,状态一致性（State Consistency）And operational consistency（Operation Consistency）.不要慌,This is not a new consistency model,They are just two perspective of observation data consistency.

状态一致性：Refers to the data of the objective、实际状态所体现的一致性;
操作一致性：Refers to the external users through the operation of the agreement,能够读取到的数据一致性.

2. 状态一致性

2.1 强一致性：MySQL 全同步复制

现在有一个 MySQL 集群,Master two consists of a three node,So in the whole synchronous replication（Fully Synchronous Replication）模式下,用户与 MySQL The process of interaction is such a.

在该模式下,主库与备库同步 binlog 时,主库只有在收到两个备库的成功响应后,才能够向客户端反馈提交成功.显然,用户获得响应时,主库和备库的数据副本已经达成一致,所以后续的读操作肯定是没有问题的,But the side effects of this model is very big,体现在以下两点.

第一,性能差.The master library must wait until the two libraries return after a successful,To submit to the user feedback success.Due to network congestion in figure,“备库 2”稍晚于“备库 1”返回响应,Increase the delay overall database.而下一次,Dragging may become“备库 1”.总之,The response time of the main library depends on the time delay of the two case library of the longest.
第二,可用性问题.我们在第 1 Speak mentioned availability concept,Any equipment is likely to fail,尤其是 x86 The general business equipment,故障率会更高.But in the whole synchronous replication mode,The three nodes in the cluster is together,如果单机可用性是 95%,那么集群整体的可用性就是 85.7%（95%*95%*95%=85.7%）,

跟单机相比反而降低了.集群规模越大,The more serious the problem of these is,So the whole synchronous replication model is rarely used in a production system.更进一步说,在工程实践中,Implementation status of strong consistency need cost too much,Especially with the availability is unable to avoid conflict,So a lot of products to choose the condition of weak consistency.

2.2 弱一致性：NoSQL 最终一致性

NoSQL 产品是应用弱一致性的典型代表,But there is a limit to accept to weak consistency is still,这就是 BASE 理论中的 E 所代表的最终一致性（Eventually Consistency）,弱于最终一致性的产品就几乎没有了.对于最终一致性,你可以这样理解：在主副本执行写操作并反馈成功时,不要求其他副本与主副本保持一致,But after a period of time the copy will eventually catch up with the progress of the master copy,重新达到数据状态的一致.

3. 操作一致性

最终一致性,在语义上包含了很大的不确定性,所以很多时候并不是直接使用,而是加入一些限定条件,也就衍生出了若干种一致性模型.Because they are in the case of inconsistent copy,Operate encapsulation to expose the data of state level,So can be incorporated into the operating perspective.

3.1 写后读一致性

许多应用让用户提交一些数据,Then look at them to submit the contents of the.May is the user records in the database,May also be a comments on the discussion topic,或其他类似的内容.Submit new data,It must be sent to the main library,But when users view the data,Can be read from the library.If the data is often view,But only occasionally write,这是非常合适的.But for asynchronous replication,问题就来了.如下图所示：If the user view the data immediately after in writing,Copy the new data may not yet have reached.对用户而言,It appears to be just submitted data lost,So they are not happy is understandable.

这种情况下,我们需要“写后读一致性”（Read after Write Consistency）,它也称为“读写一致性”,或“读自己写一致性”（Read My Writes Consistency）.You would think that the last name sounds a bit strange,But it is the most accurately describes the consistency model using effect.

No data has been written to success,下一刻一定能读取到,其内容保证与自己最后一次写入完全一致,这就是“读自己写一致性”名字的由来.当然,From the perspective of onlookers watch,可以称为“Read you write consistency”（Read Your Writes Consistency）,Some papers do take this name.假定系统可以通过某种策略由写入节点的主副本负责后续的读取操作,这样就实现了写后读一致性,

How to implement in replication system based on leaders write after read consistency？There are various possible technology,Here say something：

Contents may be modified to the user,Always read from the main library;That you want have a way of not through the actual query can know whether the user to modify some things.举个例子,Social network users on the personal data information usually only himself by the user to edit,Not by others to edit.Therefore a simple rule is：Always read from the main library user own file,If you want to read other users' files go from library.

If most of the application content can be user to edit,That this method is useless,Because most of the content must be read from the main library（Reading the telescopic no effect）.In this case, you can use other criteria to decide whether to read from the main library.For example, can track the last updated time,Since the last update in one minute,From the main library to read.You can also monitor from library replication delay,To prevent any delay the main library inquiry from the library to more than one minute.

The client can remember a recent write timestamp,Systems need to ensure that read from the library in dealing with the user's request,The change in front of the timestamp will has spread to the books from the library.If the current from the library is not new,You can from another read from the library,或者等待从库追赶上来.Here the timestamp of the logical timestamp can be（Said write order of things,For example, the log sequence number）Or the actual system clock（在这种情况下,Clock synchronization is becoming crucial）.

If you copy distribution in multiple data centers（To geographically close to the user or for usability purposes）,There will be additional complexity.Any need provided by the main library service requests must be routed to contain the main library data center.

3.2 单调读一致性

If the user from different from the library to read many times,就可能发生这种情况.例如,下图显示了用户 2345 Twice for the same query：

First query a small delay from library,Then a longer delay from the library（If the user refreshes the page each request is routed to a random server,这种情况就很有可能发生）.The first query returns the recently by the user 1234 添加的评论,But the second query doesn't return anything,Because of the lag from library also no pull to the written content.Can actually be considered the second query in earlier than the first query point observation system.If the first query did not return any content,那问题并不大,因为用户 2345 May not know the user 1234 Recently added comments.但如果用户 2345 First saw the user 1234 的评论,And then see it disappear,This can let a person feel very confused.

单调读（monotonic reads）Can guarantee this exception is not going to happen.This is a strong than consistency（strong consistency）更弱,But in the end than consistency（eventual consistency）A stronger guarantee.当读取数据时,You may see a old value;Monotonic read sequentially only means that if a user to read many times,Then they will not see time back,也就是说,If you have read to the newer data,Subsequent read will not get more old data.

实现单调读一致性的方式,可以是将用户与副本建立固定的映射关系,比如使用哈希算法将用户 ID 映射到固定副本上,这样避免了在多个副本中切换,也就不会出现上面的异常了.

3.3 前缀一致性

这天小明去看 CBA 总决赛,刚开球小明就拍了一张现场照片发到朋友圈,想要炫耀一下.小红也很喜欢篮球,但临时有事没有去现场,就在评论区问小明：“现在比分是多少？”小明回复：“4:2.”小明的同学,远在加拿大的小刚,却看到了一个奇怪的现象,评论区先出现了小明的回复“4:2.”,而后才刷到小红的评论“现在比分是多少？”.Does xiao Ming can predict the future？这是什么原因呢？We speak or look at the picture：

小明和小红的评论分别写入了节点 N1 和 N2,但是它们与 N3 同步数据时,由于网络传输的问题,N3 节点接收数据的顺序与数据写入的顺序并不一致,所以小刚是先看到答案后看到问题.显然,问题与答案之间是有因果关系的,但这种关系在复制的过程中被忽略了,于是出现了异常.保持这种因果关系的一致性,Known as the prefix to read or前缀一致性（Consistent Prefix）.它的意思是说：If a series of writing in a certain order,那么任何人读取这些写入时,也会看见它们以同样的顺序出现.

一种解决方案是,确保任何因果相关的写入都写入相同的分区,But in some applications may not be able to efficiently complete the operation.You can also take into account the comments in the original data to add an explicit causal relationship,这样系统可以据此控制在其他进程的读取顺序.

3.4 线性一致性

在“前缀一致性”的案例中,问题与答案之间存在一种显式声明,但在现实中,多数场景的因果关系更加复杂,也不可能要求全部做显式声明.Such as for distributed database,It doesn't require application system at each time changes accompanying statement,This change is because reading to which data can be.那么,In the case of explicit statement does not work,How to find a causal relationship between？

不知道你有没有听过这句话,“你所经历的一切,You make now.”Is there a little philosophical taste？For reasons that is subjective,Before everything could be the reason.所以,更可靠的方式是将自然语意的因果关系转变为事件发生的先后顺序.线性一致性（Linearizability）就是建立在事件的先后顺序之上的.在线性一致性下,整个系统表现得好像只有一个副本,所有操作被记录在一条时间线上,并且被原子化,这样任意两个事件都可以比较先后顺序.These events together form a collection of,In math is called a“全序关系”的集合,而“全序”也称为“Linear sequence”.我想,Linear consistency is therefore named.

但是,集群中的各个节点不能做到真正的时钟同步,这样节点有各自的时间线.那么,如何将操作记录在一条时间线上呢？这就需要一个绝对时间,也就是全局时钟.从产品层面看,主流分布式数据库大多以实现线性一致性为目标,在设计之初或演进过程中纷纷引入了全局时钟,比如 Spanner、TiDB、OceanBase、GoldenDB 和巨杉等等.

工程实现上,多数产品采用单点授时（TSO）,也就是从一台时间服务器获取时间,同时配有高可靠设计; 而 Spanner 以全球化部署为目标,因为 TSO 有部署范围上的限制,所以 Spanner 的实现方式是通过 GPS 和原子钟实现的全局时钟,也就是 TrueTime,它可以保证在全球范围内任意节点能同时获得的一个绝对时间,误差在 7 毫秒以内.但是,For linear consistency,Academia is controversial.Arguments from Einstein's relativity theory is an important conclusion that,“时间是相对的”.There is no time,Also there is no total order the sequence of events,Different observers may be unable to agree on which event to occur for the.因此,Linear consistency is limited.当然,从工程角度看,Because our application scenarios are within the scope of classical physics,So linear consistency is applicable.

3.5 因果一致性

Since linear consistency not perfect,So is there a way to not rely on the absolute time？当然是有的,This is the causal consistency（Causal Consistency）.Causal consistency is based on partial order relation,也就是说,部分事件顺序是可以比较的.至少一个节点内部的事件是可以排序的,依靠节点的本地时钟就行了;节点间如果发生通讯,则参与通讯的两个事件也是可以排序的,接收方的事件一定晚于调用方的事件.基于这种偏序关系,Leslie Lamport 在论文“Time, Clocks, and the Ordering of Events in a Distributed System”中提出了逻辑时钟的概念.借助逻辑时钟仍然可以建立全序关系,当然这个全序关系是不够精确的.因为如果两个事件并不相关,那么逻辑时钟给出的大小关系是没有意义的.

多数观点认为,因果一致性弱于线性一致性,但在并发性能上具有优势,也足以处理多数的异常现象,所以因果一致性也在工业界得到了应用.具体到分布式数据库领域,CockroachDB 和 YugabyteDB 都在设计中采用了逻辑混合时钟（Hybrid Logical Clocks）,这个方案源自 Lamport 的逻辑时钟,也取得了不错的效果.因此,这两个产品都没有实现线性一致性,而是接近于因果一致性,其中 CockroachDB 将自己的一致性模型称为“No Stale Reads”.

3.6 Consistency model sort strength

This paper introduces several kinds of consistency model,Measured by conformance strength：线性一致性强于因果一致性;And write after read consistency、单调读一致性、The prefix weak consistency in the first two,But this cannot compare between strong and weak.综上所述,We refer to the consistency of the model intensity of ordering as follows：

线性一致性 > 因果一致性 > { 写后读一致性,单调一致性,前缀一致性 }

原网站

版权声明
本文为[Penglai Taoist]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/222/202208092240132399.html