当前位置：网站首页>How cursors work in Pulsar

How cursors work in Pulsar

2022-08-10 05:00:00 【SparkSql】

Computer failure on the client or server side with strong fault tolerance's messaging system ensures that no messages are lost or recovered.To provide this capability, a messaging system needs an accurate tracking mechanism for message consumption and acknowledgment.Apache Pulsar uses cursors for this purpose.

How cursors work

The agent sends a message to the consumer.
On receipt of this message, the consumer will acknowledge the message and send an acknowledgement back to the broker.
The agent receives the confirmation and moves the cursor, updating the cursor ledger stored in BookKeeper.

Since proxies are stateless and cursors are stateful, Pulsar stores consumer location information in BookKeeper.When the broker receives an acknowledgment from the consumer, it updates the cursor ledger of the subscription the consumer is bound to.In the event of a consumer failure, message consumption and acknowledgment information remains safe, and no messages are recomputed when consumption is resumed.This is because cursor ledger data is securely stored on bookies according to the configured replication policy (i.e. the values of Ensemble Size, Write Quorum and Ack Quorum).

Note: A small subset of cursor metadata is stored in ZooKeeper, not in BookKeeper, such as cursorLedger index informationp>

Data partition

The data written to the topic may be a few megabytes or a few terabytes.So, the throughput of the topic is low in some cases and high in other cases, completely depends on the number of consumers.So what to do when some topics have high throughput and some are low?To solve this problem, Pulsar distributes the data of a topic to multiple machines, which is the so-called Partitions.
Partitioning is a very common method to ensure high throughput when dealing with massive data.By default, Pulsar topics are not partitioned, but can be easily Create Partitioned Topics, and specify the number of partitions.
After creating a partition topic, Pulsar can automatically partition data without affecting producers and consumers.That is, an application writes data to a topic, and after the topic is partitioned, the application code does not need to be modified.Partitioning is just an operation and maintenance operation, and the application does not need to care how the partitioning is performed.
Topic partitioning is handled by a process called a broker, and each node in a Pulsar cluster runs its own broker.

Cursor position

To know the exact position of the cursor in the topic, we can check the markDeletePosition property of the cursor, which marks the position of the acknowledged message before the oldest unacknowledged message.Since this message and all of its predecessors have been acknowledged, it can be deleted.

Note: You can check the details of a topic with the command pulsar-admin topics stats, the output of which contains the sameCursor-related information, such as markDeletePosition, cursorLedger, and individuallyDeletedMessages.

In summary, whether the agent moves the cursor is related to the attribute markDeletePosition Please note that multiple subscription cursors may be located on differentLocation.

How TTL affects cursors

By default, Pulsar permanently stores all unacknowledged messages.You might wonder if the cursor remains static if the message goes unacknowledged for a long time for some reason.In fact, we should avoid situations where there are a lot of unacknowledged messages, as this can mean a huge pressure on disk space.In this case, whether the cursor moves or not depends on the time-to-live (TTL) in the pulsar.

By setting a TTL policy, we can define the retention period for unacknowledged messages.After the configured time frame, Pulsar will automatically acknowledge these messages, forcing the cursor to move.It also means that these messages are ready to be deleted.

原网站

版权声明
本文为[SparkSql]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/222/202208100452449434.html