What is Apache Pulsar?
Yahoo developed Pulsar and it is now open source under the Apache License. Apache Pulsar is a distributed messaging system that is based on the publisher and subscriber model, and unlike other pub-sub models, apache pulsar decouples producers from consumers. Pulsar is the middleware that accepts information from producers and consumers then source that data from the pulsar.
Why Apache Pulsar?
- Low publish latency – Pulsar has a low publish latency of less than 5ms compared to famous Apache Kafka which has a publish latency of equal to or greater than 5ms. Apache Pulsar also holds low latency performance even as throughput increases.
- Scales Horizontally – Pulsar implementation can grow to meet demand. It is capable of growing 100s of nodes and therefore topics, messages, storage capacity increases without any hassle.
- Cloud Native – Since pulsar is cloud native it provides additional benefits to the organization since many businesses keep the majority of their infrastructure in the cloud.
Apache Pulsar Architecture
Apache Pulsar consist of two-layer structure which is Serving layer and a Storage layer. The serving layer is responsible for interaction between producer and consumer and called Pulsar Broker. BookKeeper bookies act as the persistent storage layer. BookKeeper has individual nodes called bookies which replicates each topics within all bookies. The benefit of separating serving and storage layers is that it enables to scale them independently. Producers send the data to pulsar topics and consumers read them from the same pulsar topic. Multiple consumers can read from the same topic. Topics with multiple partition are spread across many Pulsar brokers. Single Pulsar broker servers all the read and writes to the partition topic. All writes to the topic from the one or more producers, and all reads from the one or more consumers go through that specific Pulsar broker.
Topic, Namespace, and Tenants
Along with the topic, there are other data organization layers present in Apache Pulsar including Namespace and tenant.
- Topic – Pulsar Topic is a category or a channel name which publishes messages. Topic does not need explicitly be created by pulsar because a producer can write on a topic that does not exist yet but pulsar can create that topic.
- Namespace – To group related topics we use Namespace. There is no limit to the number of topics that can exist in a namespace.
- Tenant – Authentication and authorization schemes is enforce by tenant as it is administrative unit which also allocate capacity.
Since Apache Pulsar is a pub-sub system in which producers entries to a topic and consumers subscribe to that topic. There has to be a technique through which consumers subscribe to that topic. Apache Pulsar provides four subscription types each of which accounts for different purposes like scalability, ordering, and multiple consumers.
- Exclusive Subscription – Only a single consumer is allowed to subscribe to the partition topic due to this ordering is guaranteed but it is not scalable.
- Failover Subscription – In a failover subscription, multiple consumers can attach to the subscription. The consumer will have priority, and when the master consumer disconnects, the next consumer in line can receive messages. Like an exclusive subscription, is not very scalable.
- Shared or Round Robin Subscription – Multiple consumers can read messages from the same Pulsar broker at the same time. Sent messages reach the consumer in a random order which makes it very scalable but order guarantee is not certain.
- Key Shared Subscription – Key Shared Subscription allows multiple consumers to attach to the same subscription. Messages need to specify the ordering key and the same ordering is delivered to the consumer due to this order is guaranteed and it is also scalable.
Apache Pulsar is a complete system because of its two-layer architecture. Pulsar is different from other message distributing systems as it runs the serving layer and storage layer separately. BookKeeper is horizontally scalable, both capacity and throughput increases by the addition of more bookies to a cluster. To support more producers and consumers, users can increase the pulsar brokers. Kubernetes are useful to automate scaling of brokers.
You can visit Apache Pulsar official documentation for more information here: https://pulsar.apache.org/docs/en/standalone/
You can also read more about such technologies under apache here: https://blog.knoldus.com/?s=Apache