当前位置:网站首页>How cursors work in Pulsar
How cursors work in Pulsar
2022-08-10 05:00:00 【SparkSql】
Computer failure on the client or server side with strong fault tolerance's messaging system ensures that no messages are lost or recovered.To provide this capability, a messaging system needs an accurate tracking mechanism for message consumption and acknowledgment.Apache Pulsar uses cursors for this purpose.
How cursors work

- The agent sends a message to the consumer.
- On receipt of this message, the consumer will acknowledge the message and send an acknowledgement back to the broker.
- The agent receives the confirmation and moves the cursor, updating the cursor ledger stored in BookKeeper.
Since proxies are stateless and cursors are stateful, Pulsar stores consumer location information in BookKeeper.When the broker receives an acknowledgment from the consumer, it updates the cursor ledger of the subscription the consumer is bound to.In the event of a consumer failure, message consumption and acknowledgment information remains safe, and no messages are recomputed when consumption is resumed.This is because cursor ledger data is securely stored on bookies according to the configured replication policy (i.e. the values of Ensemble Size, Write Quorum and Ack Quorum).
Note: A small subset of cursor metadata is stored in ZooKeeper, not in BookKeeper, such as cursorLedger index informationp>
Data partition
The data written to the topic may be a few megabytes or a few terabytes.So, the throughput of the topic is low in some cases and high in other cases, completely depends on the number of consumers.So what to do when some topics have high throughput and some are low?To solve this problem, Pulsar distributes the data of a topic to multiple machines, which is the so-called Partitions.
Partitioning is a very common method to ensure high throughput when dealing with massive data.By default, Pulsar topics are not partitioned, but can be easily Create Partitioned Topics, and specify the number of partitions.
After creating a partition topic, Pulsar can automatically partition data without affecting producers and consumers.That is, an application writes data to a topic, and after the topic is partitioned, the application code does not need to be modified.Partitioning is just an operation and maintenance operation, and the application does not need to care how the partitioning is performed.
Topic partitioning is handled by a process called a broker, and each node in a Pulsar cluster runs its own broker.
Cursor position
To know the exact position of the cursor in the topic, we can check the markDeletePosition property of the cursor, which marks the position of the acknowledged message before the oldest unacknowledged message.Since this message and all of its predecessors have been acknowledged, it can be deleted.
Note: You can check the details of a topic with the command pulsar-admin topics stats, the output of which contains the sameCursor-related information, such as markDeletePosition, cursorLedger, and individuallyDeletedMessages.
In summary, whether the agent moves the cursor is related to the attribute markDeletePosition Please note that multiple subscription cursors may be located on differentLocation.
How TTL affects cursors
By default, Pulsar permanently stores all unacknowledged messages.You might wonder if the cursor remains static if the message goes unacknowledged for a long time for some reason.In fact, we should avoid situations where there are a lot of unacknowledged messages, as this can mean a huge pressure on disk space.In this case, whether the cursor moves or not depends on the time-to-live (TTL) in the pulsar.
By setting a TTL policy, we can define the retention period for unacknowledged messages.After the configured time frame, Pulsar will automatically acknowledge these messages, forcing the cursor to move.It also means that these messages are ready to be deleted.
边栏推荐
猜你喜欢
随机推荐
西门子Step7和TIA软件“交叉引用”的使用
LeetCode·124.二叉树中的最大路径和·递归
JS gets the year, month, day, time, etc. of the current time
Ueditor编辑器任意文件上传漏洞
链表的定义和使用
SQLSERVER 2008 解析 Json 格式数据
2022G3锅炉水处理考试模拟100题及模拟考试
用 PySpark ML 构建机器学习模型
大佬们,运行cdc后oracle归档日志20分钟增长3G是正常现象吗
JavsSE => 多态
【bug】尝试重新启动事Deadlock found when trying to get lock; try restarting transaction
线程(中):线程安全
最新开源的面试笔记,天花板级别!
X书6.89版本shield-unidbg调用方式
元宇宙 | 你能通过图灵测试吗?
华为交换机配置日志推送
PHPCMS仿站从入门到精通,小白看这一套课程就够了
2022年T电梯修理考试题及模拟考试
栈与队列 | 用栈实现队列 | 用队列实现栈 | 基础理论与代码原理
leetcode每天5题-Day13









